MiniMax / minimax-m2
Models
| Model | Size | Context Length | Input Modalities | Activated Parameters |
|---|---|---|---|---|
| minimax-m2 | 230B | 200K | Text | 10B |
MiniMax M2: High-Efficiency Coding and Agentic Model
MiniMax M2 is a high-efficiency large language model engineered specifically for coding and agentic workflows. With 230 billion total parameters (10 billion activated), M2 delivers exceptional performance in software development tasks while maintaining high efficiency, low latency, and cost-effective deployment.
Key Features
- Superior Intelligence: Ranks #1 among open-source models globally on Artificial Analysis composite intelligence benchmarks
- Advanced Coding: Engineered for end-to-end developer workflows with multi-file editing and test-validated repairs
- Agentic Performance: Excels at planning and executing complex, long-horizon toolchains across shell, browser, and code runners
- Efficient Design: 10B activated parameters from 230B total for optimal performance-to-cost ratio
- Long Context: 200K token context window for comprehensive codebase understanding
- Recovery Capabilities: Graceful recovery from flaky steps in complex workflows
- Evidence Traceability: Maintains clear evidence chains for agentic decision making
Model Variants
| Name | Size | Context | Input Modalities | Activated Parameters | Description |
|---|---|---|---|---|---|
| minimax-m2 | 230B | 200K | Text | 10B | Cloud-optimized high-efficiency model |
Technical Capabilities
Coding Excellence
MiniMax M2 delivers exceptional performance across the software development lifecycle:
- Multi-File Editing: Comprehensive codebase modifications and refactoring
- Coding-Run-Fix Loops: End-to-end development workflows with execution and debugging
- Test-Validated Repairs: Automated testing and validation of code changes
- Language Support: Strong performance across multiple programming languages
- IDE Integration: Optimized for terminal, IDE, and CI/CD workflows
Agentic Intelligence
- Complex Toolchains: Planning and execution across shell, browser, retrieval, and code runners
- Long-Horizon Tasks: Effective handling of multi-step, complex workflows
- Web Browsing: Advanced web exploration and information retrieval
- Recovery Mechanisms: Graceful handling of failures and flaky steps
- Evidence Tracking: Maintains traceable evidence chains for decision making
Efficiency Optimization
- Parameter Efficiency: 10B activated parameters from 230B total
- Latency Optimization: Low-latency performance for interactive applications
- Cost Efficiency: High throughput for batched sampling and deployment
- Deployment Flexibility: Optimized for cloud and edge deployment scenarios
Benchmark Performance
Coding & Agentic Benchmarks
MiniMax M2 demonstrates superior performance on comprehensive coding and agentic evaluations:
| Benchmark | MiniMax-M2 | Claude Sonnet 4 | Claude Sonnet 4.5 | Gemini 2.5 Pro | GPT-5 (thinking) | GLM-4.6 | DeepSeek-V3.2 |
|---|---|---|---|---|---|---|---|
| SWE-bench Verified | 69.4 | 72.7* | 77.2* | 63.8* | 74.9* | 68* | 67.8* |
| Multi-SWE-Bench | 36.2 | 35.7* | 44.3 | / | / | 30 | 30.6 |
| SWE-bench Multilingual | 56.5 | 56.9* | 68 | / | / | 53.8 | 57.9* |
| Terminal-Bench | 46.3 | 36.4* | 50* | 25.3* | 43.8* | 40.5* | 37.7* |
| ArtifactsBench | 66.8 | 57.3* | 61.5 | 57.7* | 73* | 59.8 | 55.8 |
| BrowseComp | 44 | 12.2 | 19.6 | 9.9 | 54.9* | 45.1* | 40.1* |
| BrowseComp-zh | 48.5 | 29.1 | 40.8 | 32.2 | 65 | 49.5 | 47.9* |
| GAIA (text only) | 75.7 | 68.3 | 71.2 | 60.2 | 76.4 | 71.9 | 63.5 |
| xbench-DeepSearch | 72 | 64.6 | 66 | 56 | 77.8 | 70 | 71 |
| τ²-Bench | 77.2 | 65.5* | 84.7* | 59.2 | 80.1* | 75.9* | 66.7 |
Intelligence Benchmarks
Artificial Analysis composite intelligence scores across math, science, instruction following, coding, and agentic tool use:
| Metric (AA) | MiniMax-M2 | Claude Sonnet 4 | Claude Sonnet 4.5 | Gemini 2.5 Pro | GPT-5 (thinking) | GLM-4.6 | DeepSeek-V3.2 |
|---|---|---|---|---|---|---|---|
| AIME25 | 78 | 74 | 88 | 88 | 94 | 86 | 88 |
| MMLU-Pro | 82 | 84 | 88 | 86 | 87 | 83 | 85 |
| GPQA-Diamond | 78 | 78 | 83 | 84 | 85 | 78 | 80 |
| HLE (w/o tools) | 12.5 | 9.6 | 17.3 | 21.1 | 26.5 | 13.3 | 13.8 |
| LiveCodeBench (LCB) | 83 | 66 | 71 | 80 | 85 | 70 | 79 |
| SciCode | 36 | 40 | 45 | 43 | 43 | 38 | 38 |
| IFBench | 72 | 55 | 57 | 49 | 73 | 43 | 54 |
| AA Intelligence | 61 | 57 | 63 | 60 | 69 | 56 | 57 |
Note: Data points marked with an asterisk () are taken from official model reports or blogs. All other metrics follow Artificial Analysis evaluation methodologies.*
Use Cases
Software Development
- End-to-End Coding: Complete software development lifecycle support
- Code Refactoring: Intelligent codebase modernization and optimization
- Test-Driven Development: Automated test generation and validation
- Debugging: Advanced bug detection and repair workflows
- Multi-Language Support: Consistent performance across programming languages
Agentic Workflows
- Enterprise Automation: Complex business process automation
- Research Assistance: Advanced information retrieval and synthesis
- Web Exploration: Intelligent web browsing and data collection
- Tool Integration: Seamless integration with development and productivity tools
- Long-Horizon Tasks: Complex, multi-step workflow management
Development Operations
- CI/CD Optimization: Continuous integration and deployment automation
- Infrastructure as Code: Automated infrastructure provisioning and management
- DevOps Automation: End-to-end development operations support
- Monitoring and Alerting: Intelligent system monitoring and incident response
Research and Innovation
- Scientific Computing: Advanced algorithm development and implementation
- Data Analysis: Comprehensive data processing and analysis
- Hypothesis Testing: Automated research workflows and validation
- Literature Review: Intelligent research paper analysis and synthesis
Getting Started
MiniMax M2 cloud model is available through various API providers. For more information:
- API Documentation: MiniMax M2 API Guide
- Model Information: MiniMax M2 Technical Report
- Developer Resources: MiniMax Developer Portal
- Playground: Test MiniMax M2 capabilities in the interactive playground
- Community: Join the MiniMax community for support and use case sharing
Moonshot AI / kimi-k2-thinking
Kimi K2 Thinking: Advanced reasoning model with test-time scaling, 256K context, and state-of-the-art agentic capabilities. Cloud-optimized for complex problem solving.
Mistral AI / mistral-large-3
Mistral Large 3: State-of-the-art multimodal MoE model with 256K context, Apache 2.0 license. Cloud-optimized for production-grade enterprise workloads.