Claude Opus 4.5 Launch: Anthropic’s new flagship model sets benchmark to outperform ChatGPT, Gemini

Anthropic has introduced Claude Opus 4.5, its most advanced AI model yet, outperforming rivals in coding, safety, and agentic tasks, marking a major leap in the global AI model race.

Nov 26, 2025 - 04:00

Claude Opus 4.5 Launch: Anthropic’s new flagship model sets benchmark to outperform ChatGPT, Gemini

In what appears to be a direct shot across the bow in the generative AI model arms race, Anthropic has released Claude Opus 4.5, the company’s new flagship offering. Anthropic is touting the model as superior to the current leaders in terms of overall performance and safety, specifically ChatGPT (OpenAI) and Gemini 3 Pro (Google DeepMind).

Anthropic claims Opus 4.5 is “the world’s best AI model for coding, agents and computer-use tasks”

Performance on a real-world software engineering benchmark SWE-bench was key to this assertion: Claude Opus 4.5 had an 80.9 % score, the first model to cross the 80 % threshold. By contrast, Gemini 3 Pro scored 76.2 %, and OpenAI’s GPT‑5.1 Codex Max had 77.9 %.

Anthropic also notes that the model out-performed human candidates taking a two-hour engineering exercise under time pressure.

Beyond performance on raw coding tasks, Opus 4.5 also has what’s known as “agentic capabilities.” An example of a benchmark scenario forces a model to act like an airline-service agent in a multi-turn conversation: when a basic-economy ticket modification is not allowed, the model instead finds a clever workaround: it upgrades the cabin first, then modifies the flights, thereby getting the same result while staying within policy.

This demonstrates the model is not only solving tasks but understanding constraints, planning, and considering real-world implications.

On the question of safety and alignment, Anthropic says that Opus 4.5 is their “most robustly aligned model” to date. The company also claims improved resilience to prompt-injection attacks (misleading instructions inserted into user prompts to trick the model into doing harmful things). Anthropic claims that Opus 4.5 is harder to trick than any other frontier model.

Users and developers can access Claude Opus 4.5 through the Claude app on Android and iOS, as well as the Claude website. It is also being rolled out to developers – suggesting Anthropic is targeting both consumer and enterprise-scale use.

What this means: The launch highlights how the AI model race is beginning to bifurcate into very specific, highly specialised domains and use cases beyond just general-purpose chatbots: in this case, software engineering and agentic problem solving. The ability to outperform human engineers on timed benchmarks is a big deal for industry roles and potential to reshape productivity tools and developer workflows. It also raises questions about how quickly model performance is improving and what “best” even means at this point.

If you work in the space of AI development, software engineering or tech leadership, Claude Opus 4.5 is now a model you should be watching – and testing.