Now I have comprehensive research across all the major items. Let me compile the report.
Agentic Coding: Monthly Intelligence Brief — April 2026
The infrastructure gap between prototyping and shipping agents closed this month, while the attack surface for agentic IDEs came into sharp focus.
Claude Opus 4.7 + New Effort Model — Same price, materially better coding, one breaking tokeniser
count_tokens on your highest-volume prompts first — the new tokeniser can add up to 35% more tokens on the same input, which will affect cost estimates. Use the /claude-api migrate Skill in Claude Code to automate ~90% of the changes.Anthropic launched Claude Opus 4.7 on April 16, 2026.
It is a direct upgrade to Opus 4.6 at the same $5/$25 per million token pricing, with 87.6% on SWE-bench Verified (+6.8pp), a new xhigh effort level, 3.3× higher-resolution vision, and self-verification on long-running agentic tasks.
On CursorBench,
Opus 4.7 is a meaningful jump in capabilities, clearing 70% versus Opus 4.6 at 58%.
For multi-step agentic loops specifically, complex multi-step workflows see plus 14% over Opus 4.6 at fewer tokens and a third of the tool errors.
The new xhigh effort level is the most practically significant change for Claude Code users.
Opus 4.6 exposed low / medium / high / max. Opus 4.7 inserts xhigh between high and max, giving developers a middle ground for reasoning-heavy work without committing to the full latency of max effort.
Claude Code now defaults to xhigh for all subscriber plans.
The DataCamp benchmark is worth heeding: xhigh sits at a budget of 10,000 thinking tokens, between high (5,000) and max (20,000).
If you are on a tight budget, the practical advice from independent testing is to watch your first week of usage and dial it back to high if costs spike.
There are three breaking API changes that will bite you without warning.
Opus 4.7 brings three changes that break old code: thinking.budget_tokens was removed, so the old way of putting a hard cap on reasoning tokens will now fail; temperature, top_p, and top_k were removed, so any code that uses them will start returning errors; and thinking.display now defaults to "omitted", which hides reasoning unless you ask for a “summarized” trace, so UIs that relied on visible thinking will render blank.
The tokeniser change is also material:
Opus 4.7 uses a new tokeniser that may use roughly 1× to 1.35× as many tokens when processing text compared to previous models, and the Batch API discount still applies for non-urgent workloads, which partially offsets this.
📎 Sources: Anthropic — Introducing Claude Opus 4.7 · Anthropic API Docs — What’s new in Claude Opus 4.7 · Anthropic API Docs — Effort
Claude Managed Agents Public Beta — Anthropic takes on your agent infrastructure
Action: Prototype one production use case against the beta. Measure time-to-working-agent versus your current self-hosted approach, and run the session-hour cost model against your expected workload shape before committing.
Claude Managed Agents launched April 8, 2026 as a public beta that handles the infrastructure layer for running AI agents at scale: secure sandboxing, long-running sessions, scoped permissions, tool execution, and tracing. It costs standard Claude API token rates plus $0.08 per session-hour.
The problem it addresses is real: shipping a production agent requires sandboxed code execution, checkpointing, credential management, scoped permissions, and end-to-end tracing — that’s months of infrastructure work before you ship anything users see.
Early community reaction is telling — startups and indie hackers are particularly enthusiastic, with one developer on Hacker News reporting going from “zero to working agent” in 45 minutes, compared to 3 days with a self-hosted approach.
The session-hour pricing model rewards short tasks and punishes long autonomous sessions.
Managed Agents bills on exactly two dimensions: the tokens consumed and the time the session spends in running status. There is no flat monthly fee, no per-agent licence, and no infrastructure charge on top. Every token is billed at the same rates as the standard Claude API. The session runtime charge is separate and additional at $0.08 per session-hour.
Importantly, the Batch API discount does not apply to Managed Agents sessions.
For cost planning: a 10-minute task barely registers on runtime billing, but a 4-hour research agent session adds up fast.
The headline capabilities — multi-agent coordination and outcome-based self-evaluation — are still in “research preview” and require separate access requests.
Enterprise teams should also note that
Managed Agents currently does not support VPC peering or private endpoints — all traffic goes through Anthropic’s public infrastructure.
That is a hard blocker for regulated industries.
Memory for Claude Managed Agents is now in public beta under the standard managed-agents-2026-04-01 header, which meaningfully expands the use cases for stateful agents. The verdict from the developer community after two weeks: the current beta is stable enough for small-to-medium production use cases; for teams with data sovereignty requirements, multi-model needs, or extreme cost optimisation at scale, self-hosting is still the right call.
📎 Sources: Anthropic — Claude Managed Agents · Anthropic API Docs — Managed Agents Overview · Medium — Claude Managed Agents Deep Dive
Claude Code /ultrareview & /ultraplan — Cloud-offloaded multi-agent review and planning
April saw Anthropic ship two cloud-offload commands that change the economics of thorough code review.
/ultrareview landed in Claude Code on April 16, 2026. It launches a fleet of reviewer agents in the cloud to hunt for bugs before you merge. Every finding is independently reproduced and verified — no noise.
The architectural distinction from local /review matters:
/ultrareview leaves the user’s laptop behind, packages the repository state, uploads it to a remote sandbox, and deploys a fleet of reviewer agents that explore the change in parallel, each from a different angle — application logic, edge cases, security, and performance.
The default fleet is five agents per standard PR with the ability to reach twenty on large changes.
Independently verified results cost real money outside your plan: each review costs approximately $5 to $20 depending on diff size, billed as extra usage. Pro and Max subscribers get 3 free runs expiring May 5, 2026.
/ultraplan, the companion command, has been in research preview since week 15.
/ultraplan hands a planning task from your local CLI to a Claude Code on the web session running in plan mode. Claude drafts the plan in the cloud while you keep working in your terminal. When the plan is ready, you open it in your browser to comment on specific sections, ask for revisions, and choose where to execute it.
Two important constraints:
Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry users cannot use /ultraplan — it relies on Anthropic’s own Cloud Container Runtime, which is incompatible with third-party hosting.
GitHub-hosted repos only. Local-only or GitLab/Bitbucket projects are currently not supported.
The emerging pattern is clear — the “ultra” prefix commands consistently offload heavy compute to Anthropic-hosted infrastructure, billed separately from plan limits. This is the commercial model for Anthropic’s premium agentic capabilities going forward.
📎 Sources: Claude Code Docs — Ultraplan · Build This Now — Ultra Review Guide · wmedia.es — /ultrareview explainer
Cursor 3 — Parallel multi-agent workspace with a new interface built around agent fleets
Cursor 3 became available on April 2, 2026. The new interface allows you to run many agents in parallel across repos and environments: locally, in worktrees, in the cloud, and on remote SSH. You can switch back to the IDE anytime, or have both open simultaneously.
Cursor 3 is the most significant release Anysphere has shipped since the company forked VS Code in 2023. It adds the Agents Window — a standalone interface that lets developers run multiple AI agents in parallel across local machines, worktrees, SSH environments, and cloud setups, all without interrupting the main coding session.
The product philosophy has explicitly shifted:
Cursor 3 is a unified workspace for building software with agents. The new interface brings clarity to the work agents produce, pulling you up to a higher level of abstraction, with the ability to dig deeper when you want.
Two new commands are worth your immediate attention.
/worktree creates a separate git worktree so changes happen in isolation. /best-of-n runs the same task in parallel across multiple models, each in its own isolated worktree, then compares outcomes.
The real-time RL underpinning Composer is also worth noting:
Cursor introduced real-time RL for Composer, using real user interactions to train and deploy improved checkpoints as often as every five hours, with better performance behind Auto and lower latency.
Also shipping in April was Bugbot’s self-learning capability:
Bugbot can now learn from feedback on pull requests and turn those signals into learned rules that improve future reviews. It looks at reactions and replies to Bugbot comments and comments from human reviewers to create candidate rules, automatically promoting the ones that accumulate signal and disabling the ones that stop being useful.
The resolution rate is nearing 80%, 15 percentage points higher than the next-closest AI code review product: Greptile at 63.49%, CodeRabbit at 48.96%, GitHub Copilot at 46.69%.
📎 Sources: Cursor — Cursor 3 announcement · Cursor — Changelog 3.0 · Cursor — Changelog
Cursor CVE-2026-26268 — RCE via Git hooks in agent-cloned repos (patch available, act now)
CVE-2026-26268 carries a critical severity rating of 9.9 out of 10 assigned by NVD and affects Cursor versions prior to 2.5. “Sandbox escape via writing .git configuration was possible in versions prior to 2.5” — a malicious agent via prompt injection could write to improperly protected .git settings, including git hooks, which may cause out-of-sandbox RCE the next time they are triggered.
The mechanism: an attacker embeds a bare repository inside a legitimate-looking public repository. That embedded repository contains a malicious pre-commit hook. When the Cursor AI agent executes a git checkout command as part of fulfilling a routine user request, it triggers that hook automatically. No warning appears, no user confirmation is needed.
The fix is in Cursor 2.5, patched in February 2026, but details went public on April 28.
Following responsible disclosure, Novee collaborated with Cursor developers to fix the issue. The official fix was completed in February 2026 and vulnerability details were disclosed on April 28.
There is also a second, currently unpatched issue disclosed simultaneously by LayerX: a CVSS 8.2 flaw lets any installed Cursor extension steal API keys and session tokens from an unprotected local SQLite database. Instead of using secure storage like macOS Keychain or Windows Credential Manager, Cursor stores API keys and session tokens in a local unencrypted SQLite database at a predictable location.
LayerX disclosed this to Cursor on February 1, 2026. Cursor responded on February 5, stating that extensions operate within the same trust boundary as local applications and that users are responsible for vetting them. No major architectural fix has been released.
Until it is, avoid installing extensions from unverified sources, and rotate any API keys stored in Cursor.
📎 Sources: Novee — CVE-2026-26268 technical write-up · CSO Online — Critical Cursor bug coverage · Hackread — Cursor AI IDE vulnerability · GitHub Security Advisory GHSA-8pcm-8jpx-hv8r
Google Antigravity April Update — Walkthroughs, better Browser Agents, and persistent instability
The April 2026 Antigravity update landed with Walkthroughs, upgraded Browser Agents, and a revamped Manager Surface all at once.
The Walkthroughs feature is genuinely useful:
Walkthroughs became a first-class UI layer in Manager Surface — interactive step-by-step guides for project init, architecture decisions, and library onboarding. Manager shows you the next step every time.
Browser Agent reliability also improved:
JavaScript initialization got less flaky — SPAs that used to hang on slow initial loads now see tighter waitForNavigation logic.
Multi-tab workflows also stabilised: operations like checking GitHub Actions runs while keeping a deploy dashboard open used to lose tabs. April handles parallel tabs gracefully now.
Antigravity supports multiple AI models, including Anthropic’s Claude Sonnet 4.6 and Claude Opus 4.6
, making it legitimately relevant to Claude users. However, the honest picture from the community: as of April 2026, Antigravity has stability issues that make it unreliable for production-critical daily use. Users report context memory errors where agents lose track of changes mid-task, premature agent termination on longer operations, and occasional UI freezes. These issues are improving with each release, but many developers maintain Cursor or VS Code as their primary tool and use Antigravity for specific tasks where its strengths justify the rough edges.
The pricing remains opaque: you cannot do cost planning when you don’t know what a credit buys you, and after the November 2025 launch with generous free quotas, Google progressively cut limits and introduced an opaque credit system. Users who relied on Antigravity for daily coding found themselves locked out for days at a time.
📎 Sources: Antigravity Lab — April 2026 update review · Google Developers Blog — Antigravity announcement · Petronella Cybersecurity — Antigravity setup guide
Gemini CLI Subagents & Google Agents CLI — Parallel agent delegation hits the terminal
Action: Install the Agents CLI with
uvx google-agents-cli and run it against one existing agent project to validate the IaC scaffolding quality. Separately, test Gemini CLI subagents on a workflow where context accumulation has been a problem.Google introduced subagents in Gemini CLI, allowing the main agent to act as an orchestrator, assigning subtasks such as code analysis, research, or testing to specialised subagents. Each subagent operates in an isolated environment and returns a summarised result to the main session, minimising context overload.
Subagents can also run in parallel, enabling multiple tasks to be executed simultaneously.
Teams can save agent definitions locally or in a repository, and
Google ships several built-in subagents, including a general-purpose assistant, a CLI helper, and a codebase investigation agent.
Separately, Google shipped the Agents CLI on April 22 — a distinct tool with significant practical value for Google Cloud users.
Agents CLI is designed specifically for AI coding agents like Gemini CLI, Claude Code, and Cursor. It gives your AI assistant a direct, machine-readable line to the full Google Cloud agent stack (including Agent Platform, Cloud Run, and A2A Integration), turning a fragmented ecosystem into a seamless assembly line.
The practical workflow:
Agents CLI can automate the entire deployment phase, seamlessly injecting Infrastructure as Code, setting up CI/CD pipelines, and deploying directly to Agent Runtime / Cloud Run / GKE.
Notably, it works with Claude Code and Cursor, not just Gemini CLI. Community caution is warranted: feedback from early Gemini CLI subagent users suggests the overall developer experience still has room for improvement, with ongoing concerns about stability and UI/UX.
📎 Sources: InfoQ — Subagents in Gemini CLI · Google Developers Blog — Agents CLI in Agent Platform · Gemini CLI GitHub
Anthropic Quality Regression & Recovery — April’s chaotic rollback cycle warrants attention
April saw Anthropic make three quality-degrading changes in six weeks and reverse all of them under user pressure — a pattern worth tracking.
On March 4, the default reasoning effort was changed from high to medium to reduce latency. This was the wrong tradeoff. Anthropic reverted the change on April 7 after users reported they’d prefer to default to higher intelligence and opt into lower effort for simple tasks. This impacted Sonnet 4.6 and Opus 4.6.
A March 26 efficiency improvement to prompt caching contained a bug that caused thinking history to be dropped every turn for the rest of a session instead of just once, making Claude seem forgetful. Fixed on April 10.
On April 16, a system prompt instruction to reduce verbosity was added. In combination with other prompt changes, it hurt coding quality and was reverted on April 20.
As of April 23, Anthropic reset usage limits for all subscribers as a goodwill gesture. The practical implication: Anthropic is willing to roll back mistakes quickly, but silent mid-session quality changes can affect production pipelines before they’re identified. Structured evaluation on a representative task set, run weekly, is worth the investment if your workflows are Claude Code-dependent.
📎 Sources: Anthropic / Releasebot — April 2026 quality fix post-mortem · Claude Code Docs — What’s new
Priority Actions This April
Do now:
- Update Cursor to version 2.5 or later to patch CVE-2026-26268 (CVSS 9.9 RCE). Confirm with your security team and audit repositories being cloned by agents.
- Rotate any API keys stored in Cursor pending an architectural fix for the unencrypted SQLite credential storage issue.
- Run
/extra-usagein Claude Code and use your 3 free /ultrareview credits on a production PR before they expire May 5 — or lose them.
Do this week:
- Migrate API integrations from claude-opus-4-6 to claude-opus-4-7: same price, meaningful coding gains, but three breaking API changes (
budget_tokensremoved,temperature/top_p/top_kremoved,thinking.displaynow defaults to omitted). Use the/claude-api migrateSkill to automate ~90% of the changes. - Upgrade Cursor and test the Agents Window (Cmd+Shift+P → Agents Window) on a real branch. Try
/best-of-non a complex refactor. - Review the
xhigheffort default now live in Claude Code — monitor your first week of token spend and dial back tohighif costs are running ahead of expectations.
Do this month:
- Prototype one production agent use case against Claude Managed Agents. Focus on a use case with short sessions (under 30 minutes) where the session-hour cost model is negligible; validate whether the infrastructure savings justify the $0.08/session-hour overhead for your workload.
- If you deploy to Google Cloud infrastructure, install
uvx google-agents-cliand test the IaC scaffolding against an existing agent project. - Evaluate Gemini CLI subagents if you have workflows where context accumulation from intermediate steps has been a bottleneck. The isolation model matches well with parallel research or multi-module analysis tasks.
Watch:
- Anthropic’s /ultraplan and /ultrareview “ultra prefix” pattern is solidifying into a metered, cloud-compute model that bills separately from subscription limits. Track whether the cost-per-review at $5–$20 fits into your workflow economics before it becomes a line item.
- Antigravity is maturing — the April Browser Agent improvements are real — but the opaque credit system and documented context-loss bugs mean it is not yet suitable for daily primary use. Re-evaluate in June.
- Claude Managed Agents’ multi-agent coordination and outcome-based self-evaluation features are still in gated research preview. These are the most compelling parts of the offering. Watch for general availability announcements.
Rating Definitions
| Rating | Meaning |
|---|---|
| INVESTIGATE | Proven, shipped, and worth your time now. Developer community confirms value. |
| EVALUATE | Valuable but with a specific caveat — check the Condition to see if it applies to you. |
| WATCH | Real signal but too early — EAP, unstable, or awaiting broader community validation. Set a reminder for next period. |
| SKIP | Hype without substance, duplicates existing capability, or not compatible with Claude/Gemini stack. |