Left for More
  • About
  • Posts
  • Research-Notes
Research-Notes

I’ll research the latest developments across the agentic coding landscape for March 2026. Let me run multiple searches simultaneously.

Now let me do a few final targeted searches on specific items I need more detail on — particularly the LiteLLM supply-chain incident and the Claude Sonnet 4.6 launch.

I now have comprehensive research across all major areas. Let me compile the intelligence brief.


Agentic Coding: Monthly Intelligence Brief — March 2026

The shift from session-based prompting to sustained, event-driven agent orchestration became concrete this month — across tooling, models, infrastructure, and supply chain security.


Claude Sonnet 4.6 + March API Platform Changes — Sonnet displaces Opus as the practical daily driver

Investigate: YES | Action: Migrate Claude Code default from Opus 4.6 to Sonnet 4.6; audit agent pipelines that rely on budget_tokens for thinking, and migrate to effort + adaptive thinking before the deprecation compounds.

Claude Sonnet 4.6 is Anthropic’s most capable Sonnet release yet — a full upgrade across coding, computer use, long-context reasoning, and agent planning — while pricing stays at $3/$15 per million tokens.

The benchmark shifts here are material, not marketing:

developers with early access preferred Sonnet 4.6 over Sonnet 4.5 approximately 70% of the time in Claude Code testing, and it was even preferred over Opus 4.5 — November 2025’s flagship — 59% of the time.

At Sonnet pricing, that rewrites the model selection calculus for most agentic pipelines.

Two specific API improvements in March are worth actioning. First,

web search and web fetch tools now support dynamic filtering on Opus 4.6 and Sonnet 4.6 — Claude can write and execute code to filter results before they reach the context window, improving accuracy while reducing token consumption.

For agentic search workloads this can meaningfully cut costs. Second,

fine-grained tool streaming is now generally available on all models and platforms, and structured outputs have also reached GA for Sonnet 4.5, Opus 4.5, and Haiku 4.5, with expanded schema support, improved grammar compilation latency, and no beta header required.

Both remove friction from production agentic systems.

On the deprecation front, watch two clocks:

Claude Haiku 3 is deprecated with retirement scheduled for 19 April 2026 — migrate to Haiku 4.5.

More structurally,

the budget_tokens thinking parameter is deprecated on Opus 4.6 and Sonnet 4.6 — still functional but no longer recommended and will be removed in a future release. Migrate to thinking: {type: "adaptive"} with the effort parameter.

📎 Sources: Anthropic API Release Notes · Introducing Claude Sonnet 4.6 (Anthropic) · What’s New in Claude 4.6 (Claude Docs) · Claude Sonnet 4.6 Specs (ClaudeFa.st)


Claude Code March Releases — Sustained autonomous sessions become the product’s real design target

Investigate: YES | Action: Wire up PermissionDenied hooks, configure --channels for mobile approval relay, and test /loop for CI monitoring use cases; run npm update -g @anthropic-ai/claude-code to pick up the month’s stability fixes.

Claude Code jumped about ten version numbers over the course of March.

The pattern across those releases is unmistakable: every meaningful feature push is aimed at reducing the need for a human to be present during an agent session.

On 23 March, Anthropic added computer use to Claude Code and Cowork for Pro and Max users, letting Claude open files, run dev tools, point, click, and navigate the screen with no setup required.

This is a meaningful unlock — agents can now traverse browser tabs, system UIs, and legacy tools that lack MCP connectors.

The tradeoff is real: Claude sees your screen through screenshots and asks permission before accessing apps.

The /loop command (introduced in v2.1.71) transforms Claude Code into a recurring monitoring system. Define an interval and a prompt, and Claude executes it automatically — /loop 5m check the deploy checks the deployment every 5 minutes. It works as a lightweight session-level cron job.

Paired with

cloud auto-fix, which lets web and mobile sessions automatically follow PRs, fix CI failures, and address review comments in the cloud so you return to a ready-to-go pull request,

the agent can now cover the full post-commit lifecycle unattended.

The hooks model also matured.

The new PermissionDenied hook fires after auto mode classifier denials — returning {retry: true} tells the model it can retry.

This lets you build programmatic escalation logic rather than waiting on the approval queue. On the stability side,

a bug was fixed where prompt cache misses in long sessions were caused by tool schema bytes changing mid-session

— a subtle issue that was silently costing tokens and coherence in extended runs.

Nested CLAUDE.md files being re-injected dozens of times in long sessions that read many files was also patched,

addressing another long-session reliability problem.

The --channels permission relay (research preview) can forward approval prompts to your phone,

closing the loop on truly unattended sessions.

📎 Sources: Claude Code March 2026 Release Notes (Releasebot) · Every Claude Code Update from March 2026 (Builder.io) · Claude Code Changelog (ClaudeFa.st) · Claude Release Notes (Claude Help Centre)


Cursor Composer 2 + Automations + Self-Hosted Agents — Cursor stakes out infrastructure territory

Investigate: YES | Action: Evaluate Composer 2 via the Auto model selector in Cursor as the cost-performance benchmark for your own frontier model spend; assess Automations for CI/CD trigger patterns; prioritise self-hosted agents if you have data residency requirements.

March was Cursor’s most consequential shipping month to date. Three launches landed in sequence: Automations (5 March), Composer 2 (19 March), and Self-Hosted Cloud Agents (25 March).

Automations give users a way to automatically launch agents within their coding environment, triggered by a new addition to the codebase, a Slack message, or a simple timer — a way to review and maintain all new code created by agentic tools without tracking dozens of agents at once. At the most basic level, Automations break engineers out of the “prompt-and-monitor” dynamic.

When invoked, the agent spins up a cloud sandbox and follows your instructions using the MCPs and models you’ve configured. Agents also have access to a memory tool that lets them learn from past runs and improve with repetition.

The system is already used for incident response, with PagerDuty incidents initiating an agent that can immediately query server logs through an MCP connection.

Composer 2 is now available in Cursor — frontier-level at coding and priced at $0.50/M input and $2.50/M output tokens.

The technical report (published to arXiv on 25 March) is worth reading:

it covers the full training process, from continued pretraining on an open base model, Kimi K2.5, through large-scale reinforcement learning, with a focus on closely emulating the real Cursor environment.

On CursorBench, Composer 2 scores 61.3 — a 37% improvement over Composer 1.5 — and is competitive with the strongest frontier models. On public benchmarks it scores 73.7 on SWE-bench Multilingual and 61.7 on Terminal-Bench.

Benchmarks aside, the strategic point is clearer: Cursor is reducing dependence on Anthropic and OpenAI model spend while protecting margins at scale.

Self-hosted cloud agents keep your code and tool execution entirely in your own network. Your codebase, build outputs, and secrets all stay on internal machines running in your infrastructure, while the agent handles tool calls locally — offering the same capabilities as Cursor-hosted cloud agents, including isolated VMs, full development environments, multi-model harnesses, and plugins.

For regulated environments that block cloud code egress, this is the unblock.

📎 Sources: Cursor Changelog · Introducing Composer 2 (Cursor) · Composer 2 Technical Report (arXiv) · Cursor Automations (TechCrunch) · Cursor Composer 2 Review (BuildFastWithAI)


LiteLLM PyPI Supply Chain Attack (TeamPCP) — AI agent toolchains are now primary targets

Investigate: YES | Action: Audit every environment for LiteLLM 1.82.7 or 1.82.8 immediately. If found, treat as fully compromised — rotate all credentials. Pin LiteLLM in all manifests. Check Google ADK Python if you use [eval] or [extensions] extras.

This is the most urgent security item in the report.

On 24 March 2026, threat actor TeamPCP published backdoored versions of the litellm Python package after stealing PyPI credentials via a compromised Trivy GitHub Action in LiteLLM’s CI/CD pipeline.

The compromised packages were litellm 1.82.7 and 1.82.8, live on PyPI from 10:39 UTC for approximately 40 minutes before being quarantined.

The payload design is sophisticated and particularly dangerous for developer machines.

The compromised versions contained a .pth file (litellm_init.pth) that executes automatically on every Python process startup — no import required. It harvests SSH keys, .env files, AWS/GCP/Azure credentials, Kubernetes configs, database passwords, shell history, crypto wallets, and CI/CD secrets, then exfiltrates via AES-256-CBC + RSA-4096 encrypted POST to an attacker-controlled lookalike domain.

The package passed all standard integrity checks because the malicious content was published using legitimate credentials — no hash mismatch, no suspicious domain, no misspelled package name.

The blast radius for agent framework users is direct.

LiteLLM is an opt-in dependency installed from PyPI when you install the eval and extensions extras in Google ADK Python.

Google’s advisory recommends rotating all secrets, API keys, SSH keys, and service account keys in any environment where affected LiteLLM versions were installed and investigating exposed infrastructure.

LiteLLM has since released v1.83.0 via a new CI/CD v2 pipeline with isolated environments and stronger security gates.

The broader lesson:

this attack followed the same playbook TeamPCP used to compromise Aqua’s Trivy security scanner and Checkmarx’s VS Code extensions, indicating a sustained campaign targeting DevOps and AI toolchain infrastructure.

Pin your dependencies by hash. Do not rely on range constraints in AI framework installs.

📎 Sources: LiteLLM Security Update (LiteLLM) · litellm PyPI Supply Chain Attack (FutureSearch) · ADK Security Advisory (GitHub #5005) · Compromised LiteLLM PyPI Package (Sonatype) · How a Poisoned Security Scanner Backdoored LiteLLM (Snyk) · TeamPCP Campaign (Help Net Security)


Google ADK Goes Multi-Language — Java 1.0 and Go 1.0 ship with A2A and HITL baked in

Investigate: CONDITIONALLY YES | Condition: Java or Go shops building production multi-agent systems on Gemini. Python-primary teams already covered. | Action: Java teams should review the ADK Java 1.0 HITL ToolConfirmation API and native A2A client; Go teams evaluate the OpenTelemetry tracing integration for production observability.

On 30 March, Google announced ADK for Java 1.0.0. What started with Python has now grown into a multi-language ecosystem: Python, Java, Go, and TypeScript.

Both Java and Go reached 1.0 in the final week of March, signalling production stability commitments rather than preview status.

ADK for Java now natively supports the official Agent2Agent (A2A) Protocol, allowing ADK agents to seamlessly communicate with remote agents built in any language or framework. You can resolve an AgentCard from a remote endpoint, construct the client, and wrap it in a RemoteA2AAgent. This remote agent can be placed into your ADK agent hierarchy and acts exactly like a local agent.

That is the practical unlocking of cross-language, cross-framework multi-agent coordination — not just a protocol spec.

ADK Go 1.0 introduces native OpenTelemetry integration — by plugging in an OTel TraceProvider, every model call and tool execution loop generates structured traces and spans to help debug complex agent logic.

For production agent systems where debugging non-determinism is the hardest problem, this is the feature that matters most.

ADK Go 1.0 also supports defining agents directly through YAML configurations, ensuring feature parity and cross-language consistency — developers can manage and run agents via the adk CLI without writing boilerplate Go code for every configuration change.

Note the LiteLLM supply chain advisory above applies to any ADK Python installation using [extensions] or [eval] extras — update immediately.

📎 Sources: Announcing ADK for Java 1.0.0 (Google Developers Blog) · ADK Go 1.0 Arrives (Google Developers Blog) · ADK Python (PyPI) · ADK Docs


Gemini CLI Plan Mode + MCP 2026 Roadmap — Infrastructure maturity converges with growing deployment pain

Investigate: CONDITIONALLY YES | Condition: Plan Mode is immediately actionable for any Gemini CLI user; MCP roadmap tracking matters if you have production MCP deployments hitting scale or enterprise auth constraints. | Action: Enable Plan Mode before any large refactor. Pin MCP server versions in production — enterprise auth and stateless Streamable HTTP are coming in H1/H2 2026 but not here yet.

On 11 March, Google shipped Plan Mode for Gemini CLI. With Plan Mode active, Gemini CLI focuses first on analysing the request, planning complex changes, understanding the codebase or dependencies — all in a read-only mode safe from accidental changes. Plan Mode also asks clarifying questions before proposing a strategy for review.

Plan Mode restricts Gemini CLI to a subset of tools: the agent can navigate your codebase, search for patterns, and read documentation, but cannot modify any files except its own internal plans.

This is the equivalent of Claude Code’s plan mode, now shipping in the open-source Gemini CLI where you can inspect and extend the policy.

Gemini CLI v0.34 also added native gRPC support and protocol routing for A2A, plus webfetch tool Stage 1 improvements.

On MCP infrastructure, the 2026 roadmap (published 5 March by lead maintainer David Soria Parra) is the most useful document for anyone scaling MCP in production. The four priority areas are explicit:

Streamable HTTP gave MCP a production-ready transport, but running it at scale has revealed gaps. Goals include evolving Streamable HTTP to run statelessly across multiple server instances and defining how sessions are created, resumed, and migrated so that server restarts are transparent to clients.

Enterprises deploying MCP are running into a predictable set of problems: audit trails, SSO-integrated auth, gateway behaviour, and configuration portability — and a dedicated Enterprise WG does not yet exist.

If your team is experiencing these problems, engaging with the MCP working groups now is the best way to influence the roadmap.

By March 2026, all major providers were on board with MCP at approximately 97 million monthly downloads.

📎 Sources: Plan Mode Now Available in Gemini CLI (Google Developers Blog) · MCP 2026 Roadmap (MCP Blog) · MCP Roadmap (Official Docs) · MCP’s Biggest Growing Pains (The New Stack) · Gemini CLI March Changelog (Releasebot)


Windsurf Pricing Overhaul + Arena Mode — The $5 price advantage is gone; the product direction matters more now

Investigate: MONITOR | Action: If you’re currently on Windsurf Pro, audit whether your actual usage pattern hits daily/weekly quota limits under the new system before the next billing cycle. Watch post-acquisition product velocity through Q2 2026 before making multi-year decisions.

On 19 March 2026, Windsurf replaced its credit-based pricing with a quota system, moving Pro to $20/month — identical to Cursor Pro. Teams went to $40/seat and a new Max tier launched at $200/month.

The $5 price increase eliminates Windsurf’s clearest competitive advantage over Cursor’s $20 Pro — pricing parity means the differentiation must now come from product.

The product did ship meaningful features.

Arena Mode brings side-by-side model comparison directly into the IDE — run two Cascade agents simultaneously with hidden model identities and vote on which performs better. Battle Groups let you compare “fast models” vs “smart models,” and votes contribute to both personal and global leaderboards.

This is a genuinely useful calibration tool for teams trying to make evidence-based model selections rather than relying on marketing benchmarks.

Windsurf’s native SWE-1.5, announced as the new “Fast Agent” model, achieves near-frontier coding quality at dramatically faster inference — 13x faster than Sonnet 4.5 — with fixed-rate-per-message billing rather than token-based charges.

The structural uncertainty is real.

The founding team that built Cascade is now at Google. Cognition is a capable company, but it’s an open question whether Windsurf’s product velocity will match its pre-acquisition trajectory. For teams making long-horizon toolchain decisions, that uncertainty is real.

Watch the changelog cadence through Q2 before making multi-year commitments.

📎 Sources: Windsurf Changelog · Windsurf vs Cursor 2026 (Verdent AI) · Windsurf Pricing 2026 (Verdent AI) · Windsurf AI Review 2026 (PopularAITools)


JetBrains Central + LangGraph v1.1 — Enterprise agent orchestration gets governance primitives

Investigate: MONITOR | Action: Apply for JetBrains Central EAP if you manage agent fleets across JetBrains IDEs at team scale. Upgrade LangGraph to v1.1 for type-safe streaming — it is fully backwards compatible.

AI is beginning to change how software is produced. Instead of just assisting developers inside the editor, AI agents now investigate issues, generate code, run tests, and execute multi-step workflows. As this scales, software development becomes a distributed system of agents, environments, and workflows that operate across IDEs, CLIs, pipelines, and collaboration tools.

JetBrains Central, announced 24 March, is the company’s architectural response to this.

JetBrains Central acts as a control layer across agentic workflows alongside JetBrains’ Air agentic development environment and the Junie coding agent, connecting developer tools, agents, and development infrastructure into a unified system where automated work can be executed and governed across teams.

Agents can come from JetBrains or external ecosystems, including Claude Agent, Codex, Gemini CLI, or custom-built solutions.

The Early Access Program will launch in Q2 2026 with a limited group of design partners.

On LangGraph,

LangGraph v1.1 was released in March, including type-safe streaming, type-safe invoke, Pydantic and dataclass coercion — fully backwards compatible.

More substantively,

teams at Stripe, Ramp, and Coinbase each built internal coding agents and independently landed on nearly identical architectures: isolated cloud sandboxes, curated toolsets, subagent orchestration, and tight integration with developer workflows — Open SWE captures that pattern in an open-source framework you can fork and deploy yourself.

That convergence on architecture is signal worth tracking if you’re designing your own internal coding agent harness.

📎 Sources: Introducing JetBrains Central (JetBrains Blog) · JetBrains Central (InfoWorld) · March 2026 LangChain Newsletter · LangGraph GitHub


Priority Actions This Month

Do now:

  • LiteLLM audit — Run pip show litellm in every environment. If versions 1.82.7 or 1.82.8 are present, treat the machine as compromised. Rotate all credentials: SSH keys, cloud provider creds, API keys, Kubernetes configs. Pin litellm to >=1.83.0 or <=1.82.6 in all manifests. Update Google ADK Python immediately if you use [eval] or [extensions] extras.
  • Migrate off budget_tokens — Deprecated on Opus 4.6 and Sonnet 4.6. Switch to thinking: {type: "adaptive"} with the effort parameter now to avoid breakage in a future model release.
  • Haiku 3 retirement — 19 April cutoff. Migrate any Haiku 3 agents to Haiku 4.5.

Do this week:

  • Update Claude Code — npm update -g @anthropic-ai/claude-code. Pick up PermissionDenied hooks, /loop, long-session cache-miss fixes, and the Windows PowerShell preview if relevant.
  • Switch default to Sonnet 4.6 — At $3/$15, it outperforms Opus 4.5 in 59% of Claude Code sessions. The cost case for running Opus 4.6 daily is now narrow.
  • Enable Gemini CLI Plan Mode — Zero configuration required. Use before any non-trivial refactor session for read-only architectural analysis before execution.

Do this month:

  • Evaluate Cursor Automations — Map your CI/CD trigger points (code commit, PagerDuty alert, Slack webhook) to automation templates. Start with a security scan or Bugbot equivalent.
  • Assess Composer 2 — If you run frontier model budget in Cursor, test Composer 2 via Auto — $0.50/M input undercuts Claude Opus 4.6 significantly for comparable coding quality on Terminal-Bench.
  • Review the MCP 2026 Roadmap — If you have MCP servers in production, audit against the four priority areas: stateless Streamable HTTP, Tasks lifecycle gaps, governance delegation, and enterprise auth. The OAuth 2.1 enterprise auth work is not shipping until Q2 2026 at earliest.
  • Google ADK Java/Go — Java and Go shops building on Gemini: evaluate the 1.0 milestones. A2A and HITL are now first-class in both runtimes.

Watch:

  • JetBrains Central EAP — Q2 2026 launch with limited design partners. Real problem being solved (agent governance at team scale), but not shippable yet. Apply for the EAP if you manage mixed-IDE agent fleets.
  • Windsurf post-acquisition velocity — The founding team is at Google. Watch the March–May changelog cadence for signs of deceleration before making long-term toolchain decisions.
  • Cursor self-hosted agents — Newly available but awaiting broader community validation on performance parity vs. hosted. Worth a spike if data residency is a blocker.
  • TeamPCP campaign progression — Trivy, Checkmarx VS Code extensions, LiteLLM, Telnyx SDK all hit in March. This is a sustained campaign against DevOps and AI toolchain infrastructure. Expect more targets. Hash-pin your critical dependencies now.

Investigate Ratings — Definitions

RatingMeaning
YESProven, shipped, and worth your time now. Developer community confirms value.
CONDITIONALLY YESValuable but with a specific caveat (e.g. only if you’re on AWS, only if you build Apple platform apps, only if cost is a constraint). State the condition clearly.
MONITORReal signal but too early — EAP, unstable, or awaiting broader community validation. Set a reminder for next period.
SKIPHype without substance, duplicates existing capability, or not compatible with Claude/Gemini stack.
    © Left for More 2026