Google has launched an experimental AI model, Gemini 2.0 Flash Thinking Experimental, aimed at advancing reasoning capabilities in artificial intelligence.
Available via Google’s AI Studio, the model is designed for complex problem-solving in fields such as programming, math, and physics.
However, early testing highlights that the model, described as “experimental,” still has room for improvement.
Gemini 2.0 Flash Thinking Experimental builds on the Gemini 2.0 Flash framework, with features comparable to reasoning-focused models like OpenAI’s o1. These AI systems aim to self-check their reasoning processes, potentially avoiding errors that typically affect AI outputs.
Logan Kilpatrick, who oversees AI Studio’s product offerings, called the model “the first step in [Google’s] reasoning journey” in a post on X (formerly Twitter). Jeff Dean, chief scientist at Google DeepMind, added that the model has been trained to “use thoughts to strengthen its reasoning.”
Dean noted that the model shows promise when provided with additional computational resources during inference — the process by which AI generates answers.
While reasoning models like Gemini 2.0 Flash Thinking Experimental aim to enhance accuracy, they often require longer processing times. For instance, the model takes several seconds to minutes to consider related prompts, explain its reasoning, and provide what it determines to be the most accurate answer.
Google isn’t alone in the race to refine reasoning models. In recent months, AI labs including DeepSeek and Alibaba have introduced their own reasoning-focused systems. According to reports, Google has over 200 researchers working on such technology, reflecting its strategic importance to the company.
The broader push for reasoning models stems from the need to find alternatives to “brute force” scaling techniques, which have shown diminishing returns in improving generative AI performance.
Reasoning models hold promise but face significant challenges, including high computational costs and uncertainties about their scalability. While benchmarks suggest progress, it remains unclear if they can sustain this rate of improvement.