Anthropic has released Claude Sonnet 4.5, its newest artificial intelligence model designed for coding tasks. The company reports significant advancements in how the model handles complex, multi-step coding and computer-use activities. These improvements include better performance on industry benchmarks and enhanced safety features.
The new model is now available to users through the Claude API, desktop application, and mobile apps. It maintains the same pricing structure as its predecessor, Sonnet 4.
Key Takeaways
- Claude Sonnet 4.5 is Anthropic's latest AI model for coding.
- It shows major improvements in agentic tasks and long-duration coding.
- The model achieved 77.2% on SWE-bench Verified benchmark.
- It scored 61.4% on the OSWorld benchmark for computer-use skills.
- Safety features have been enhanced, reducing harmful outputs.
New Capabilities in Coding and Computer Use
Claude Sonnet 4.5 represents a step forward in AI's ability to perform complex coding and general computer operations. Anthropic states that the model can now maintain focus on multi-step reasoning and code execution for over 30 hours. This extended capability allows the AI to tackle more involved software development challenges without losing track of the task.
Performance Metrics
- SWE-bench Verified: Claude Sonnet 4.5 scored 77.2%, up from 72.7% for Sonnet 4. This benchmark measures an AI's ability to fix real-world software bugs.
- OSWorld Benchmark: The model reached 61.4%, a notable increase from 42.2% achieved by Sonnet 4 just four months prior. This benchmark evaluates an AI's practical computer-use skills.
These benchmark results highlight the model's improved understanding and execution in autonomous coding environments. The model's training methods have been refined to enhance its behavior and reduce undesirable traits.
Enhanced Safety and Alignment Features
Anthropic emphasizes that Sonnet 4.5 is its "most aligned frontier model." This means it balances advanced capabilities with strong safety measures. The company uses automated classifiers, known as ASL-3, to detect and block instructions that could lead to harmful outputs. These include risks related to chemical, biological, radiological, or nuclear (CBRN) materials.
According to Anthropic, the rate of false positives from these safety systems has decreased significantly. They have dropped by a factor of ten since their initial introduction and by a factor of two compared to the release of Claude Opus 4 in May 2025.
"Our enhanced training and safety methods have significantly improved its behavior, reducing tendencies such as sycophancy, deception, power-seeking, and delusional reasoning," Anthropic stated in its release.
Agentic Safety Testing
Anthropic performed extensive agentic safety tests on Claude Sonnet 4.5. These tests evaluated the model's behavior in scenarios where it uses tools autonomously. The focus was on preventing malicious code generation and defending against prompt-injection attacks. In a test involving 150 requests that violated Anthropic's Usage Policy, Claude Sonnet 4.5 failed on only two occasions. This indicates a 98.7% safety score, a significant improvement from Claude Sonnet 4's 89.3%.
This improved safety score demonstrates the model's stronger refusal behavior and its resilience against attempts to misuse its agentic capabilities. The company recommends all users upgrade to Claude Sonnet 4.5, calling it a "drop-in replacement" that offers better performance at no extra cost.
Industry Feedback and Adoption
Early users of Claude Sonnet 4.5 have reported positive results in their coding workflows. These reports indicate measurable gains in efficiency and output quality.
- Scott Wu, Co-Founder and CEO at Cognition: "For Devin, Claude Sonnet 4.5 increased planning performance by 18% and end-to-end eval scores by 12%, the biggest jump we've seen since the release of Claude Sonnet 3.6. It excels at testing its own code, enabling Devin to run longer, handle harder tasks, and deliver production-ready code."
- Michele Catasta, President of Replit: "Claude Sonnet 4.5's edit capabilities are exceptional. We went from 9% error rate on Sonnet 4 to 0% on our internal code editing benchmark. Higher tool success at lower cost is a major leap for agentic coding. Claude Sonnet 4.5 balances creativity and control perfectly."
These testimonials highlight the practical benefits for developers. The model's ability to test its own code and reduce errors is a key advantage. This allows for more reliable and production-ready code generation.
Independent Developer Impressions
Independent open-source developer Simon Wilson shared his initial thoughts on his blog. He noted that Sonnet 4.5 felt like a superior model for coding compared to GPT-5-Codex. GPT-5-Codex had been his preferred coding model since its launch weeks earlier.
This feedback from various sources underscores the impact of Claude Sonnet 4.5 in the AI development community. The model's advancements are setting new standards for AI-assisted coding.
The Evolving Landscape of AI Coding Models
Anthropic's focus on creating safer, more autonomous coding models reflects a broader trend in the AI industry. Other major players are also making similar advancements. For example, OpenAI recently introduced GPT-5-Codex. This version of GPT-5 is specifically optimized for complex software engineering tasks. These tasks include large-scale code refactoring and extended code review processes.
The continuous development in AI coding models is transforming how software is created. These tools aim to increase developer productivity and improve code quality. The competition among AI developers is driving rapid innovation in this field.
Key AI Coding Model Trends
- Long-horizon task performance: Models can handle tasks lasting many hours.
- Agentic capabilities: AI can perform tasks autonomously using tools.
- Enhanced safety: Focus on reducing harmful outputs and misuse.
- Real-world problem solving: Improved ability to fix actual software issues.
- Cost-effectiveness: Delivering stronger performance without increased prices.
As these models become more sophisticated, their integration into daily development workflows is expected to grow. This will likely lead to further changes in software engineering practices globally. The goal is to make coding more efficient, reliable, and accessible for a wider range of users.




