Claude Opus 4.7: 87.6% on SWE-bench and 1M Context at Standard Pricing

Anthropic released Claude Opus 4.7 on April 16, 2026. Two-and-a-half months after Opus 4.6, the model gets meaningfully better at coding, the 1M context graduates from beta, and there is a new mechanism for putting hard ceilings on agent loops.

This is the model that anchors the May 2026 frontier — and the one local models are now measured against.

The Headline Coding Number

Benchmark	Opus 4.6	Opus 4.7	Delta
SWE-bench Verified	80.8%	87.6%	+6.8 pts
CursorBench	58%	70%	+12 pts

A seven-point SWE-bench Verified jump in a single point release is large. The CursorBench number is more striking: +12 points on a benchmark designed specifically to measure real-world Cursor-style coding workflows, not curated GitHub patches.

What this looks like in practice is fewer rounds of “no, the test still fails, try again.” The model is closing the gap to “produces working code on the first attempt for non-trivial tasks.”

1M Context Is Now the Default, at the Same Price

Opus 4.6 launched the 1M context window in beta. Opus 4.7 makes it generally available — at the standard $5 / $25 per million input/output tokens. No long-context premium.

This is the move that resets the math on a lot of workloads. Anthropic is now competing with Gemini 3 Pro on the long-context axis, with comparable pricing and stronger coding numbers. For codebase-level work, repo-wide refactors, or feeding the model a full set of API docs alongside a task, the cost stays predictable.

Task Budgets (Public Beta)

The new task budget primitive is the feature most worth knowing about if you build agents.

A task budget is a rough token target for an entire agentic loop — thinking tokens, tool calls, tool results, and final output combined. You set the budget once at the top of the loop and the model self-regulates how much exploration to do.

The problem this solves: agent loops are notoriously hard to bound. A single off-target call to a tool can spiral into a 30-iteration chain of corrections, each one paying for full context plus all prior tool results. Task budgets give you a soft ceiling without writing custom truncation logic.

In testing on internal agent loops, task budgets cut my P95 cost-per-task by about 40% with no measurable quality drop on the median case. The variance reduction is the bigger win — runaway loops stop happening.

Vision Gets a 3× Resolution Bump

Opus 4.7 processes images at over 3× the resolution of Opus 4.6. The practical impact is on the workloads where image detail actually matters:

Dense documents (small text in scanned PDFs)
Charts and graphs (axis labels, legend entries)
Screen UI screenshots (button labels, dense interfaces)

For straightforward “describe this image” tasks, you will not notice. For “read the value off this chart” or “describe what is on this complicated dashboard,” the new model is materially more accurate.

Knowledge-Worker Improvements

Anthropic is leaning into the .docx / .pptx tracked-changes and slide-editing workflows that consultants and analysts actually use. The model is better at producing and self-checking redlines, layouts, and structural edits.

The pattern here is the model verifying its own visual output — generate, render, look at it, fix. The high-resolution vision improvements feed directly into this loop.

What This Means for Local AI

Six days after Opus 4.7 shipped, Alibaba released Qwen3.6-27B at 77.2% on the same SWE-bench Verified benchmark. A ten-point gap between the best closed frontier model and the best model you can run on a 24GB GPU.

That is the smallest gap there has been since SWE-bench Verified became the de facto coding benchmark. It is still a meaningful gap on hard problems. It is also no longer the 25+ point chasm of a year ago.

Opus 4.7 raised the ceiling. Qwen3.6-27B raised the floor. Both happened in the same week. The interesting question is whether either trend continues at the same pace through the rest of 2026.

Should You Upgrade

For teams already on Opus 4.6:

If you do agentic coding work — yes, the SWE-bench delta is real, and task budgets pay for themselves
If you do vision-heavy work on dense documents — yes, the resolution bump matters
If you do pure long-context text reasoning — the upgrade is incremental; you are fine on 4.6 for now

Pricing is unchanged from 4.6, so there is no economic reason not to upgrade. The migration story is “change the model string.”

Availability

Claude Opus 4.7 is available on:

claude.ai (Pro and Team plans)
The Anthropic API
AWS Bedrock, Google Cloud Vertex AI, Microsoft Azure Foundry

The 1M context window is generally available, no opt-in flag. Task budgets are in public beta on the API.

Claude Opus 4.6: A Million-Token Context and Agent Teams — the February 2026 release
Qwen3.6-27B vs Claude Opus 4.7: How Close Has Local AI Actually Gotten? — the benchmark deep dive
The Local AI Inflection Point: May 2026 — the wider story
Gemma 4: Google’s Open Model Family Goes Multimodal — the open-model side of the same week