Left 4 More
  • Posts
  • About
  • RSS
Research-Notes

I’ll research the major developments from May 2026 across the agentic coding landscape. Let me run multiple searches simultaneously.

Good data from those searches. Let me now search for a few more specific items — Cursor’s security vulnerability, Claude’s Managed Agents / self-hosted sandbox update, and the Google Antigravity launch.

Excellent — I now have comprehensive data across all the major stories. Let me compile the brief.


Agentic Coding: Monthly Intelligence Brief — May 2026

May was defined by two parallel themes: multi-agent execution becoming a mainstream primitive, and the attack surface of agentic IDEs finally receiving serious security scrutiny.


Claude Opus 4.8 + Dynamic Workflows — Fleet-scale orchestration lands in Claude Code

INVESTIGATE Action: Update Claude Code, run /workflows to test the new dynamic workflow primitive on a non-trivial multi-step task, and evaluate fast mode on Opus 4.8 as a cost-effective substitute for Opus 4.7 sessions.

Opus 4.8 shipped in the final week of May as the new default model in Claude Code, with /effort xhigh set as the default for the hardest tasks and a dynamic workflows feature that allows Claude to orchestrate work across tens to hundreds of background agents.

This is a meaningful architecture shift: previously, spinning up parallel agent tasks required explicit scripting or SDK work. Dynamic workflows move that orchestration into the conversational layer — you describe what you want, Claude builds and executes the workflow.

You can track all workflow runs with /workflows, and fast mode on Opus 4.8 is priced at 2x the standard rate for 2.5x the speed — a meaningful improvement over Opus 4.7 fast mode economics.

The week prior, auto mode arrived on Pro accounts, now supporting Sonnet 4.6 alongside Opus, replacing permission prompts with background safety checks.

Combined with the new /goal command, which keeps Claude working across turns until a specified completion condition holds

, the product is clearly moving toward longer-horizon autonomous sessions where human checkpoints are the exception, not the norm.

The claude agents view, which shipped in week 20, opens a single screen showing every Claude Code session — what’s running, what’s blocked, and what’s done

, rounding out a coherent multi-agent management surface. The week 21 addition of /code-review for correctness bugs and pinned background sessions (Ctrl+T in claude agents) that survive idle and restart in place for updates both reduce the overhead of managing parallel work. One practical caveat: a same-week hotfix was required to patch Opus 4.8 thinking block corruption that caused API errors

, suggesting the model upgrade shipped slightly rough and warranting caution on mission-critical workflows until the model has settled.

📎 Sources: releasebot.io/updates/anthropic/claude-code · code.claude.com/docs/en/whats-new · claudelog.com


Claude Managed Agents: Self-Hosted Sandboxes + MCP Tunnels — Enterprise data residency unblocked

EVALUATE Condition: You are building Claude agents that need to execute tools against regulated data, internal databases, or services that cannot be exposed to the public internet.
Action: Enable self-hosted sandboxes via the Claude Console (public beta, no access request needed), then join the MCP tunnels research preview waitlist for private MCP server access.

Announced at Code with Claude London on 19 May, self-hosted sandboxes and MCP tunnels directly address the problem that has kept regulated industries from deploying Claude agents in production: tool execution and private data access happening outside the enterprise security perimeter.

The architectural split is elegant.

The agent loop — orchestration, context management, error recovery — stays on Anthropic infrastructure, while tool execution and private MCP server access move inside the customer perimeter.

Launch sandbox partners are

Cloudflare, Daytona, Modal, and Vercel, with a BYO sandbox option as a fifth path.

MCP tunnels let agents reach MCP servers inside private networks without exposing them to the public internet. A lightweight gateway makes a single outbound connection — no inbound firewall rules, no public endpoints, traffic encrypted end to end.

For teams who have been hand-rolling VPN-exposed MCP server access or simply avoiding internal tool connections altogether, this is the most practical change in the enterprise Claude stack in months.

MCP tunnels are immediately useful for any developer wanting to connect Claude agents to a private development database or internal API without a public endpoint — if you have been avoiding connecting Claude to internal services because of the exposure requirement, MCP tunnels change that calculus.

Two gaps worth knowing before you build: self-hosted sandboxes are not yet available on Claude Platform on AWS, and Memory is not yet supported in self-hosted sessions.

MCP tunnels are in research preview — you need to request access and the documentation carries explicit “as-is” language, so treat this as an early programme rather than a production-ready feature.

📎 Sources: claude.com/blog/claude-managed-agents-updates · infoq.com/news/2026/05/claude-mcp-tunnels/ · the-decoder.com


Google Antigravity 2.0 — Multi-agent orchestration platform, not an IDE upgrade

WATCH Action: If you have existing Gemini CLI scripts or CI/CD pipelines, migrate to the Antigravity CLI before the June 18 hard cutoff. Otherwise, track the platform for one more release cycle before committing workflow time — the 2.0 launch rollout was disruptive and some rough edges remain.

At Google I/O on 19 May, Google launched Antigravity 2.0 — not as an upgrade to the existing IDE, but as a brand-new standalone desktop application built around multi-agent teams, scheduled tasks, native voice, and one-click integration with Google’s developer stack.

The ambition is real: the announcement came packaged with five products simultaneously — the Antigravity CLI (which sunsets Gemini CLI), the Antigravity SDK, a Managed Agents API, AI Studio Build export, and Gemini 3.5 Flash as the new default model.

The I/O demo —

93 parallel sub-agents, 2.6B tokens processed, under $1,000 in API credits to build a working OS core

— is the most credible demonstration of mass-parallel agent execution shown publicly to date.

The execution was messy, however.

The automatic rollout converted the tool from an AI code editor into a five-component agent orchestration platform, removed the traditional IDE without warning, and broke developer environments — the IDE still exists but must now be downloaded separately.

Community reception on Hacker News and Reddit skewed negative on launch day, with the sentiment settling to cautious interest once developers understood the IDE was accessible separately.

Claude Code still wins on raw code quality — SWE-Bench Pro at 64.3% for Opus 4.7 — while Antigravity leads on tool-orchestration-heavy benchmarks: MCP Atlas score of 83.6% versus Claude’s 79.1%.

These are different strengths, and the practical conclusion is: use Claude Code for the main interactive loop and production work; Gemini/Antigravity for cheap one-shot wide scans over a giant codebase or batch multimodal tasks. They fit different jobs.

The hard deadline for existing Gemini CLI users:

Gemini CLI stops serving Free/Pro/Ultra tiers on June 18, 2026, and IDE extensions using Gemini CLI will stop working.

If you have CI/CD pipelines or scripts shelling out to the gemini binary, this migration is urgent.

📎 Sources: techcrunch.com/2026/05/19/google-launches-antigravity-2-0 · 9to5google.com · revolutioninai.com · agentpedia.codes/blog/google-antigravity-2-0-launch


Cursor 3.5: Cloud Agents + Composer 2.5 + CVE-2026-26268 — Significant capability gain with a critical security backlog

INVESTIGATE Action: Update to Cursor 3.5 immediately (patches CVE-2026-26268). Then trial Build in Parallel on a feature branch and enable Bugbot high-effort mode for security-sensitive PRs. If your team clones repositories from untrusted sources, add a policy requiring isolated environments for that work.

Cursor 3.5 launched on 20 May with Cloud Agents — agents that run in isolated cloud VMs with full terminal, browser, and desktop access, can work across multiple repos in parallel, and report results back to your IDE asynchronously.

That headline is backed by meaningful groundwork shipped through the month:

Cursor 3.3 (7 May) added Build in Parallel subagents, pinned-skill pills, Composer 2.5 with multi-file refactor at file-tree scale, and native Jira integration.

Composer 2.5 is a substantial improvement in intelligence and behaviour over Composer 2, particularly on long-horizon agentic tasks.

Cursor trained Composer 2.5 with targeted textual feedback — providing feedback directly at the point in the trajectory where the model could have behaved better, inserting a hint into local context and using the resulting model distribution as a teacher.

The Jira integration is immediately practical: you can assign work items to Cursor or mention @Cursor in a comment to kick off a cloud agent; Cursor uses the ticket title, description, comments, and repo settings to scope the task, and when the agent finishes, Jira shows completion updates and a link to the PR.

The security picture this month is not comfortable.

CVE-2026-26268 carries a critical severity rating of 9.9 from NVD and affects Cursor versions prior to 2.5; a sandbox escape via writing .git configuration was possible.

According to findings by AI pentesting platform Novee Security, once a developer cloned and interacted with a malicious repository, the IDE’s AI agent could trigger embedded Git logic resulting in attacker-controlled code execution — the root cause being a feature interaction in Git that becomes exploitable the moment an AI agent starts autonomously executing Git operations inside a repository it doesn’t control.

The flaw is now patched, with no indication of in-the-wild exploitation.

The deeper concern, flagged by Repello AI’s enterprise hardening guide:

Cursor shipped 11+ CVEs and named vulnerabilities in the 2025–2026 window, and architecturally the CVEs cluster into four patterns — pre-trust execution/TOFU bypass, indirect prompt injection → file write → RCE, sandbox escape via privileged side-channel, and supply chain (OpenVSX + Chromium lag).

Each patch closes a specific vulnerability, not the architectural shape that produced it. Teams using Cursor in enterprise environments should review Repello AI’s hardening guide linked below.

📎 Sources: cursor.com/changelog · csoonline.com/article/4164250 · cursor.com/blog · repello.ai/blog/cursor-security


Windsurf: Devin Review Bundled + Adaptive Router — Competitive catch-up with a pricing model that needs attention

EVALUATE Condition: You are a current Windsurf subscriber and have not yet tried Devin Review or the Adaptive model router.
Action: Enable Devin Review on your next PR cycle using the two-week free trial; switch your default model to Adaptive for a week and compare output quality and credit burn against your current fixed-model usage.

From 6 May, all Windsurf IDE users gained access to Devin Review and Quick Review with their existing subscription, with Devin Review available on a two-week free trial for all self-serve users.

This is a meaningful move — Devin Review was previously a separately-priced Cognition product, and bundling it into the base IDE subscription closes a gap on Cursor’s Bugbot (which remains separately priced at $40/user/month).

Claude Opus 4.7 fast mode also became available in Windsurf during May, with the full intelligence of Opus 4.7 but approximately 2.5x higher output speeds.

The new Adaptive model option in the model picker intelligently selects the best model for each task, helping users make their quota last longer by avoiding overuse of premium models.

In practice, Adaptive is Windsurf’s answer to the “I don’t want to think about model selection” problem — a reasonable default for teams who want throughput optimisation without manual model-switching. The main caution: a bug in the Adaptive model router previously prevented model switching after the first request, and affected users had quota reset and overage restored

— meaning Adaptive has had reliability issues and should be validated before wholesale adoption in automated pipelines. Windsurf’s overall stability track record remains patchier than Cursor’s, and this month was no exception, with a May 26 authentication outage affecting the IDE for approximately 90 minutes.

📎 Sources: windsurf.com/changelog · windsurf.com · releasebot.io/updates/windsurf


MCP 2026-07-28 Release Candidate — Stateless transport and breaking changes locked

WATCH Action: Review the draft specification now if you maintain MCP servers or build MCP-dependent tooling. Check for the -32002 to -32602 error code change and plan SDK upgrades. Tier 1 SDKs are expected to ship compliance within the ten-week validation window ending 28 July.

The MCP 2026-07-28 release candidate was locked on 21 May, with the final specification shipping on July 28. The ten-week window exists for SDK maintainers and client implementers to validate changes against real workloads; Tier 1 SDKs are expected to ship support within this window.

The headline change is significant for anyone building production MCP infrastructure:

MCP is now stateless at the protocol layer — six Specification Enhancement Proposals work together to complete a plan laid out in December.

This resolves one of the most persistent operational headaches in production MCP deployments: stateful server sessions that fail non-gracefully under load or network interruption.

MCP Apps (SEP-1865) lets servers ship interactive HTML interfaces that hosts render in a sandboxed iframe. Tools declare their UI templates ahead of time so hosts can prefetch, cache, and security-review them before anything runs; the rendered UI talks back to the host over the same JSON-RPC base protocol used everywhere else in MCP.

This is the most interesting new primitive in this release — it enables server-side tooling to have first-class UI without breaking the audit and consent path.

A critical breaking change: the error code for a missing resource changes from MCP-custom -32002 to JSON-RPC standard -32602 Invalid Params. If your client matches on the literal -32002 value, update it.

The NSA published a security advisory on MCP in May, reinforcing that malicious actors can manipulate context parameters to trigger unintended file I/O behaviour, and every tool invocation should validate inputs against well-defined schemas, expected ranges, and the intended execution context.

📎 Sources: blog.modelcontextprotocol.io · nsa.gov MCP Security advisory · digitalapplied.com


Claude Security Plugin + NSA MCP Advisory — Shift-left security is now table stakes

INVESTIGATE Action: Install the Claude Code security-guidance plugin via /plugins now — it's free for all plan tiers and requires zero workflow changes. If you maintain MCP servers, read the NSA advisory and validate input schemas against it before the MCP 2026-07-28 spec drop.

Anthropic shipped a security-guidance plugin for Claude Code that identifies and fixes vulnerabilities as you write code, available for all Claude Code users and installable from the plugin marketplace via /plugins. When a risky pattern is identified, Claude prompts an inline fix within the same coding session, eliminating the need to context-switch to a separate security scanner.

Once active, it monitors code edits, diffs, and commits in real time; the plugin uses regex-based pattern matching to detect approximately 25 high-risk vulnerability classes. Internal data shows a 30–40% reduction in security-related comments on pull requests since the tool was introduced.

Anthropic’s broader Claude Security platform complements the plugin with deeper AI-driven codebase scanning, tracing data flows across files, running adversarial verification passes on findings, and proposing targeted patches for human review.

Claude Security is available in public beta for Claude Enterprise customers, with access for Team and Max customers coming soon.

The timing of the NSA MCP security advisory in the same week is notable: the advisory states that robust observability is essential for MCP environments, and all tool and model invocations should be logged, including exact parameters and identities involved.

If your team is running production MCP servers and you have not done a security review since the spec stabilised, the NSA advisory is a useful checklist.

📎 Sources: cybersecuritynews.com/anthropic-updates-claude-code/ · claude.com · nsa.gov MCP Security advisory


Priority Actions This May

  • Do now: Update Cursor to 3.5 — CVE-2026-26268 is rated 9.9 critical and the patch is in the current release. If you have Gemini CLI in any CI/CD script, migrate to the Antigravity CLI before the June 18 hard cutoff.

  • Do this week: Install the Claude Code security-guidance plugin (/plugins) — zero friction, free for all tiers. Enable MCP tunnels request access for any team connecting Claude agents to private internal services. Review the MCP 2026-07-28 release candidate for the -32002 → -32602 error code breaking change.

  • Do this month: Test Claude Code’s new dynamic workflows primitive on a real multi-step task. Trial Cursor’s Build in Parallel and Jira cloud agent integration if your team operates a backlog-driven workflow. Evaluate Windsurf’s Devin Review against Cursor Bugbot during the free trial period. Run Repello AI’s Cursor hardening checklist against your enterprise Cursor deployment.

  • Watch: Anthropic MCP tunnels research preview — the architecture is correct, but the “as-is” preview language and missing AWS support mean it is not yet ready for production commitments. Google Antigravity 2.0 SDK — the multi-agent harness and MCP Atlas benchmark leads are real, but community trust needs a full release cycle to recover from the disruptive auto-update rollout.


Rating Definitions

RatingMeaning
INVESTIGATEProven, shipped, and worth your time now. Developer community confirms value.
EVALUATEValuable but with a specific caveat — check the Condition to see if it applies to you.
WATCHReal signal but too early — EAP, unstable, or awaiting broader community validation. Set a reminder for next period.
SKIPHype without substance, duplicates existing capability, or not compatible with Claude/Gemini stack.
    © Left 4 More 2026