MiniMax today released M2.7, an open-source large language model that holds a distinction no prior model has claimed: it actively participated in its own training.

Self-Evolution in Practice

During M2.7's development, earlier versions of the model were deployed inside MiniMax's reinforcement learning pipeline. The model was tasked with building skills for its own RL harness, updating its own memory systems, and optimizing the training loop based on live results. In one documented run, M2.7 executed over 100 autonomous rounds of "analyze failure โ†’ plan change โ†’ modify code โ†’ evaluate โ†’ keep or revert," ultimately achieving a 30% improvement on internal benchmarks โ€” without human intervention at the task level.

MiniMax is careful to note researchers set the direction and reviewed critical decisions. But the model handled 30โ€“50% of the day-to-day research workflow autonomously.

Benchmarks

M2.7 scores 56.22% on SWE-Pro, putting it near Claude Opus territory in software engineering. On MLE-Bench Lite โ€” 22 machine learning competitions run on a single GPU โ€” it averaged a 66.6% medal rate across three trials, trailing only Claude Opus 4.6 (75.7%) and GPT-5.4 (71.2%). For office productivity tasks, it posts the highest ELO (1495) among open-source models on GDPval-AA.

Why It Matters

Every major AI lab has talked about recursive self-improvement as a theoretical milestone. M2.7 is the first public demonstration of a production model closing that loop, even partially. Whether or not this represents a meaningful step toward AGI is debatable โ€” but it's a meaningful step toward cheaper, faster model iteration.