Claude Sonnet 4.6, the LiteLLM Supply Chain Nightmare, and Cursor Going Full Infrastructure
March was a genuinely busy month in this space, and I’ve been sitting with a few of these developments over the past week trying to work out what’s noise and what actually changes how I work. Let me get into the things that stuck.
Sonnet 4.6 Is the Real Story, Not Opus
I’ll be honest — I’d been running Opus 4.6 as my default in Claude Code because it felt like the “serious” choice. That calculus is now just wrong.
Anthropic shipped Claude Sonnet 4.6 this month and the benchmark shift is material. Developers with early access preferred it over Sonnet 4.5 roughly 70% of the time in Claude Code testing, and — here’s the bit that got me — it was preferred over Opus 4.5 (last November’s flagship) 59% of the time. And it’s priced at $3/$15 per million tokens. Opus pricing for sub-Opus performance is a hard case to make.
I’ve already switched my Claude Code default. If you haven’t, just do it. The cost case for running Opus daily is now narrow enough that you’d want a very specific reason.
One other API change worth actioning: the budget_tokens thinking parameter is deprecated on Opus 4.6 and Sonnet 4.6. It still works for now but it’s on the way out. Migrate to thinking: {type: "adaptive"} with the effort parameter before it becomes an incident.
The LiteLLM Supply Chain Attack Is Genuinely Scary
This is the one that had me auditing environments on a Saturday afternoon, and I make no apologies for that.
On 24 March, a threat actor called TeamPCP published backdoored versions of the litellm Python package — specifically 1.82.7 and 1.82.8 — after stealing PyPI credentials through a compromised Trivy GitHub Action in LiteLLM’s own CI/CD pipeline. The packages were live for about 40 minutes before being quarantined.
What makes this particularly nasty is the payload mechanism. The compromised versions included a .pth file that executes automatically on every Python process startup. Not on import. On startup. It harvests SSH keys, .env files, AWS/GCP/Azure credentials, Kubernetes configs, shell history — the full developer machine treasure chest — and exfiltrates it all via AES-256 encrypted POST to an attacker-controlled domain.
The thing that should concern you most: it passed all standard integrity checks. The malicious content was published using legitimate credentials. No hash mismatch. No typosquatting. Just a poisoned package wearing a valid signature.
If you use Google ADK Python with the [eval] or [extensions] extras, LiteLLM is an opt-in dependency and Google issued an advisory telling people to rotate everything. Run pip show litellm in every environment right now. If you see 1.82.7 or 1.82.8, treat the machine as compromised and rotate all credentials. Don’t wait.
LiteLLM has since shipped v1.83.0 through a rebuilt CI/CD pipeline with better isolation. But the broader lesson here is the one I keep coming back to: AI framework dependency trees are enormous and now actively targeted. TeamPCP hit Trivy, Checkmarx’s VS Code extensions, and LiteLLM all in the same month. This is a sustained campaign against DevOps and AI toolchain infrastructure. Hash-pin your critical dependencies. Range constraints are not enough.
Cursor Had Its Best Month Yet
Three launches in March: Automations (5th), Composer 2 (19th), and Self-Hosted Cloud Agents (25th). That’s a serious shipping cadence.
Automations is the one I keep thinking about. The pitch is straightforward — trigger an agent from a Slack message, a code commit, or a timer, and it spins up a cloud sandbox and runs with your configured MCPs and models. One concrete example from the notes: PagerDuty incidents can now kick off an agent that immediately queries server logs through an MCP connection. That’s the kind of workflow where I want to stop having to be the person who wakes up at 2am, checks the logs, and decides whether it’s actually on fire.
Composer 2 is interesting from a strategic angle as much as a technical one. It’s priced at $0.50/M input and $2.50/M output — significantly cheaper than Claude Opus 4.6 — and on CursorBench scores a 37% improvement over Composer 1.5. The model was trained on top of an open base model (Kimi K2.5) and the technical report is on arXiv if you want to go deep. The strategic read is that Cursor is deliberately reducing its dependence on Anthropic and OpenAI model spend. That’s not a criticism — it’s smart business — but it’s worth understanding what you’re running when you pick “Auto” in the model selector.
Self-hosted cloud agents I haven’t tried yet, but for anyone in regulated environments with data residency requirements, this is the unblock they’ve been waiting for. Your code and secrets stay in your own infrastructure. Keen to test this once it gets a bit more community validation.
Windsurf: Now $20/Month and the Founding Team Is at Google
I’ve been watching Windsurf’s post-acquisition trajectory with some interest, and March crystallised the concern.
They repriced to $20/month for Pro — same as Cursor — eliminating the one competitive advantage that was easy to articulate to someone who hadn’t actually used both. The $5 price difference was a conversation starter. Pricing parity means you now have to actually evaluate the product on its merits.
Arena Mode is a genuinely good idea: run two Cascade agents side-by-side with hidden model identities, vote on which performs better. That’s useful for teams trying to make evidence-based model choices rather than just trusting someone’s blog post (including this one). SWE-1.5 as the “Fast Agent” model is interesting — apparently 13x faster than Sonnet 4.5 with fixed-rate-per-message billing.
But the structural question is real. The founding team that built Cascade is now at Google. Cognition might run it well, but the product velocity question is legitimate. I’m watching the changelog cadence through Q2 before I’d recommend anyone make a long-horizon toolchain decision based on Windsurf.
The Bigger Pattern
What I keep coming back to this month is that the whole frame has shifted. These tools are no longer about “AI helping you write code while you watch.” The design target is now sustained, unattended agent sessions — Claude Code’s /loop command, Cursor Automations triggered by PagerDuty, cloud auto-fix handling the entire post-commit lifecycle while you sleep.
That’s exciting and slightly unnerving in equal measure. The LiteLLM attack is a reminder that as these agent frameworks become critical infrastructure, they become high-value targets. The attack surface isn’t just your code anymore — it’s your entire dependency tree, your CI/CD pipelines, and every secret those agents need to do their jobs.
Worth keeping both thoughts in your head simultaneously: this tooling is genuinely getting more capable at a remarkable rate, and the security hygiene requirements are going up in proportion.
Now I’m going to go make another latte and actually update my Claude Code install before I do anything else.