OpenAI has unveiled its latest AI reasoning model, o3, which the company claims is its most advanced to date.
Building on the success of previous models, o3 incorporates innovations in compute scaling and a novel safety approach called “deliberative alignment.” This breakthrough aims to improve how AI systems reason while adhering to OpenAI’s safety policies.
In newly released research, OpenAI outlines how deliberative alignment helps models like o1 and o3 consider safety guidelines during inference—the phase when a user submits a prompt and the AI generates a response.
This method allows the models to re-prompt themselves with OpenAI’s safety policies and deliberate on their responses, ensuring they align with ethical and safety standards.
For example, if a user asks how to forge a disabled parking placard, the AI references OpenAI’s policy, identifies the unsafe nature of the request, and refuses to assist. Unlike traditional AI safety measures, which occur during pre-training or post-training, deliberative alignment actively moderates responses in real-time.
The innovation has bolstered the o-series models’ ability to reject harmful queries while improving responses to benign ones. On the Pareto benchmark, which evaluates resistance to jailbreaks, o1-preview outperformed competitors like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Flash.
However, balancing safety and utility remains challenging. OpenAI must ensure its models refuse harmful requests without overly restricting valid inquiries. While deliberative alignment represents significant progress, researchers acknowledge the ongoing complexity of aligning AI behavior with human values.