Chinese AI startup MiniMax shipped M2.7 this week with a claim that stands out from the usual benchmark parade: an earlier version of the model was used to build the very reinforcement learning infrastructure that trained the final release.

During development, MiniMax deployed an internal M2.7 build as a research agent inside its RL team. The model monitored experiments, read logs, fixed bugs, opened merge requests, and ran smoke tests โ€” autonomously handling 30-50% of the end-to-end development workflow. Researchers only stepped in for critical decisions.

What M2.7 Can Do

On the SWE-Pro benchmark, M2.7 scores 56.22%, approaching Claude Opus 4.6's level. On MLE Bench Lite โ€” a machine learning competition suite designed to evaluate autonomous research skills โ€” it achieves a 66.6% medal rate, tying with Google's Gemini 3.1.

Other benchmarks:

  • Terminal Bench 2: 57.0%
  • VIBE-Pro (full project delivery): 55.6%
  • GDPval-AA (office software): ELO 1495, highest among open-source models

The model also maintains a 97% skill adherence rate across 40 complex multi-step skill workflows.

Proprietary Shift

This release marks a strategic change for MiniMax. The company built its reputation on frontier open-source models โ€” M2.5 and predecessors were freely available. M2.7 is proprietary, a move that follows Chinese competitors like z.ai's GLM-5 Turbo and signals a broader shift in China's AI landscape toward closed, monetized models.

M2.7 is available through the MiniMax Agent platform and API. A high-throughput variant, M2.7-highspeed, provides the same outputs at faster inference speeds for production workloads.