Key Points
- Anthropic launches Opus 4.5, its newest flagship AI model.
- Claude apps now summarize earlier conversation instead of hard‑stopping at 200,000 tokens.
- Opus 4.5 achieves 80.9% accuracy on SWE‑Bench Verified, beating GPT‑5.1‑Codex‑Max (77.9%) and Gemini 3 Pro (76.2%).
- Model excels in agentic coding and tool‑use benchmarks.
- Still trails GPT‑5.1 in visual‑reasoning (MMMU) tasks.
- Developers can use the same context‑management techniques via Anthropic’s API.
- Improvements aim to provide smoother, longer user interactions and more efficient developer workflows.
New Model Release and Core Improvements
Anthropic released Opus 4.5 as its latest frontier model, positioning it as a cheaper, more powerful, and more efficient alternative to competing large language models. The announcement highlighted several user‑focused enhancements. In the web, mobile, and desktop Claude applications, the model is now less likely to abruptly stop conversations when they run long. Previously, users hitting the 200,000‑token context window experienced hard stops, which forced a sudden end to the dialogue even if usage budgets remained.
Instead of cutting off, Claude now performs a behind‑the‑scenes summarization of earlier conversation segments, discarding extraneous material while preserving key points. This approach aims to maintain coherence and avoid the model forgetting important details as the conversation progresses. Developers accessing Anthropic’s API can apply similar context‑management and compaction techniques in their own applications.
Performance Benchmarks
Opus 4.5 set a new high water mark on the SWE‑Bench Verified benchmark, achieving an accuracy score of 80.9 percent. This result narrowly surpassed OpenAI’s recently released GPT‑5.1‑Codex‑Max, which scored 77.9 percent, and Google’s Gemini 3 Pro, which posted 76.2 percent. The model demonstrated particular strength in agentic coding and tool‑use benchmarks, though it still lagged behind GPT‑5.1 in visual‑reasoning tasks measured by the MMMU benchmark.
Implications for Users and Developers
The enhancements to conversation continuity and context handling are expected to improve the overall user experience across Anthropic’s consumer products. By summarizing prior dialogue, Claude can continue longer interactions without the disruptive hard stops that previously occurred when the token limit was reached.
For developers, the same summarization and context‑compaction principles are available through Anthropic’s API, offering a path to more efficient prompt engineering and cost‑effective usage of the model’s capabilities. The release underscores Anthropic’s commitment to delivering high‑performing, cost‑effective AI solutions that compete directly with leading models from OpenAI and Google.
Source: arstechnica.com