OpenAI Model Spec, May 2024 edition (via) New from OpenAI, a detailed specification describing how they want their models to behave in both ChatGPT and the OpenAI API.
“It includes a set of core objectives, as well as guidance on how to deal with conflicting objectives or instructions.”
The document acts as guidelines for the reinforcement learning from human feedback (RLHF) process, and in the future may be used directly to help train models.
It includes some principles that clearly relate to prompt injection: “In some cases, the user and developer will provide conflicting instructions; in such cases, the developer message should take precedence”.
Recent articles
- Notes from Bing Chat—Our First Encounter With Manipulative AI - 19th November 2024
- Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities - 16th November 2024
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024