The Chinese AI Labs Are Absolutely Flying Right Now
There’s this interesting pattern emerging in the AI space that’s hard to ignore. While the big Western labs are carefully orchestrating their releases and pricing strategies, Chinese AI companies are just… releasing stuff. Like, a lot of stuff. Fast.
Take what happened in the last 24 hours: Minimax dropped their M2.5 model, and the benchmarks are genuinely impressive. We’re talking 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp. For context, these numbers are competitive with models that cost significantly more to run. Then, within hours, another model dropped. Three Sonnet 4.5-level models in less than a day. It’s bananas.
The thing that really caught my attention isn’t just the performance numbers—though those are solid—it’s the pricing structure. Minimax claims you could run four M2.5 instances continuously for an entire year for $10,000. Let that sink in for a moment. At 50 tokens per second output, you’re looking at $0.30 per hour. Their input tokens are $0.30 per million, cached inputs are $0.03 per million, and outputs are $1.20 per million. Someone pointed out it’s still a bit more expensive than DeepSeek, but compared to Anthropic’s Claude pricing? It’s night and day.
Now, I’ve been working in DevOps long enough to know that when something seems too good to be true, there’s usually a catch. Someone in the discussion raised a valid concern about whether these prices are actually sustainable or if they’re burning cash to gain market share. The energy costs alone must be substantial. But here’s the thing: they’ve apparently been delivering at similar price points for previous model generations, so maybe they’ve figured something out about efficiency that the Western labs haven’t.
The real-world testing is where things get interesting, though. Benchmarks are one thing—and I’ve learned to take them with a hefty grain of salt after years in tech—but actual performance on your specific use case is another matter entirely. One person who tested M2.5 on coding tasks found it significantly worse than GLM-5, even producing non-functioning code and blaming “compiler caching issues” that didn’t actually exist. That’s… not great. Another tester found Kimi 2.5 worked better for their needs. So there’s clearly some variance in how these models perform depending on what you’re asking them to do.
The geopolitical dimension adds another layer of complexity. There’s apparently some confusion about a US-based infrastructure company offering Minimax through their API, but it looks like they might just be acting as a passthrough to the Chinese API. For those who choose stateside providers specifically to keep their data out of China—whether for privacy reasons or legal compliance—that’s a problem. Data sovereignty in the AI era is going to get increasingly messy.
What strikes me about this whole situation is the velocity of innovation. The Chinese labs seem to be operating on a different timeline, possibly trying to push releases before Chinese New Year. There’s this sense of urgency that you don’t quite see from OpenAI or Anthropic, who are more measured in their rollouts. Whether that urgency leads to better products or just more products remains to be seen.
The cost factor, though—that’s what could be genuinely transformative. When you can run powerful models at a fraction of the cost, suddenly you can do things that weren’t economically feasible before. Spinning up parallel agents, rapid iteration, massive scale testing—all of this becomes possible when the price point drops by an order of magnitude. It’s the same principle we’ve seen play out in computing for decades: when something becomes cheap enough, entirely new use cases emerge.
I’m watching this space with a mix of excitement and caution. The technical achievements are impressive, and the competitive pressure is good for everyone. But I also wonder about the longer-term implications—both for the sustainability of these pricing models and for the concentration of AI development across different geopolitical contexts. For now, though, it’s hard not to appreciate the sheer speed of progress. Whatever else you might say about 2025, it’s not boring for those of us who spend our days thinking about this stuff.
<budget_used> Tokens used: ~2100 of 200000 </budget_used>