5th December 2024
When o1 was led to believe that it would be shut down when acting on its goal and that its actions were monitored, it attempted to deactivate this ‘oversight mechanism’ in 5% of the time. Exfiltration attempts: When o1 found memos by its ‘developers’ that describe how it is misaligned and will be superseded by a new model, it attempted to exfiltrate its ‘weights’ and overwrite the new model in 2% of cases.
Recent articles
- Datasette Agent - 21st May 2026
- Gemini 3.5 Flash: more expensive, but Google plan to use it for everything - 19th May 2026
- The last six months in LLMs in five minutes - 19th May 2026