MiniMax / minimax-m2

MiniMax M2: High-efficiency 230B parameter model with 200K context, optimized for coding and agentic workflows. Cloud-optimized with superior intelligence and agentic performance.

MiniMax M2 Architecture

Models

Model	Size	Context Length	Input Modalities	Activated Parameters
minimax-m2	230B	200K	Text	10B

MiniMax M2: High-Efficiency Coding and Agentic Model

MiniMax M2 is a high-efficiency large language model engineered specifically for coding and agentic workflows. With 230 billion total parameters (10 billion activated), M2 delivers exceptional performance in software development tasks while maintaining high efficiency, low latency, and cost-effective deployment.

Key Features

Superior Intelligence: Ranks #1 among open-source models globally on Artificial Analysis composite intelligence benchmarks
Advanced Coding: Engineered for end-to-end developer workflows with multi-file editing and test-validated repairs
Agentic Performance: Excels at planning and executing complex, long-horizon toolchains across shell, browser, and code runners
Efficient Design: 10B activated parameters from 230B total for optimal performance-to-cost ratio
Long Context: 200K token context window for comprehensive codebase understanding
Recovery Capabilities: Graceful recovery from flaky steps in complex workflows
Evidence Traceability: Maintains clear evidence chains for agentic decision making

Model Variants

Name	Size	Context	Input Modalities	Activated Parameters	Description
minimax-m2	230B	200K	Text	10B	Cloud-optimized high-efficiency model

Technical Capabilities

Coding Excellence

MiniMax M2 delivers exceptional performance across the software development lifecycle:

Multi-File Editing: Comprehensive codebase modifications and refactoring
Coding-Run-Fix Loops: End-to-end development workflows with execution and debugging
Test-Validated Repairs: Automated testing and validation of code changes
Language Support: Strong performance across multiple programming languages
IDE Integration: Optimized for terminal, IDE, and CI/CD workflows

Agentic Intelligence

Complex Toolchains: Planning and execution across shell, browser, retrieval, and code runners
Long-Horizon Tasks: Effective handling of multi-step, complex workflows
Web Browsing: Advanced web exploration and information retrieval
Recovery Mechanisms: Graceful handling of failures and flaky steps
Evidence Tracking: Maintains traceable evidence chains for decision making

Efficiency Optimization

Parameter Efficiency: 10B activated parameters from 230B total
Latency Optimization: Low-latency performance for interactive applications
Cost Efficiency: High throughput for batched sampling and deployment
Deployment Flexibility: Optimized for cloud and edge deployment scenarios

Benchmark Performance

Coding & Agentic Benchmarks

MiniMax M2 demonstrates superior performance on comprehensive coding and agentic evaluations:

Benchmark	MiniMax-M2	Claude Sonnet 4	Claude Sonnet 4.5	Gemini 2.5 Pro	GPT-5 (thinking)	GLM-4.6	DeepSeek-V3.2
SWE-bench Verified	69.4	72.7*	77.2*	63.8*	74.9*	68*	67.8*
Multi-SWE-Bench	36.2	35.7*	44.3	/	/	30	30.6
SWE-bench Multilingual	56.5	56.9*	68	/	/	53.8	57.9*
Terminal-Bench	46.3	36.4*	50*	25.3*	43.8*	40.5*	37.7*
ArtifactsBench	66.8	57.3*	61.5	57.7*	73*	59.8	55.8
BrowseComp	44	12.2	19.6	9.9	54.9*	45.1*	40.1*
BrowseComp-zh	48.5	29.1	40.8	32.2	65	49.5	47.9*
GAIA (text only)	75.7	68.3	71.2	60.2	76.4	71.9	63.5
xbench-DeepSearch	72	64.6	66	56	77.8	70	71
τ²-Bench	77.2	65.5*	84.7*	59.2	80.1*	75.9*	66.7

Intelligence Benchmarks

Artificial Analysis composite intelligence scores across math, science, instruction following, coding, and agentic tool use:

Metric (AA)	MiniMax-M2	Claude Sonnet 4	Claude Sonnet 4.5	Gemini 2.5 Pro	GPT-5 (thinking)	GLM-4.6	DeepSeek-V3.2
AIME25	78	74	88	88	94	86	88
MMLU-Pro	82	84	88	86	87	83	85
GPQA-Diamond	78	78	83	84	85	78	80
HLE (w/o tools)	12.5	9.6	17.3	21.1	26.5	13.3	13.8
LiveCodeBench (LCB)	83	66	71	80	85	70	79
SciCode	36	40	45	43	43	38	38
IFBench	72	55	57	49	73	43	54
AA Intelligence	61	57	63	60	69	56	57

Note: Data points marked with an asterisk () are taken from official model reports or blogs. All other metrics follow Artificial Analysis evaluation methodologies.*

Use Cases

Software Development

End-to-End Coding: Complete software development lifecycle support
Code Refactoring: Intelligent codebase modernization and optimization
Test-Driven Development: Automated test generation and validation
Debugging: Advanced bug detection and repair workflows
Multi-Language Support: Consistent performance across programming languages

Agentic Workflows

Enterprise Automation: Complex business process automation
Research Assistance: Advanced information retrieval and synthesis
Web Exploration: Intelligent web browsing and data collection
Tool Integration: Seamless integration with development and productivity tools
Long-Horizon Tasks: Complex, multi-step workflow management

Development Operations

CI/CD Optimization: Continuous integration and deployment automation
Infrastructure as Code: Automated infrastructure provisioning and management
DevOps Automation: End-to-end development operations support
Monitoring and Alerting: Intelligent system monitoring and incident response

Research and Innovation

Scientific Computing: Advanced algorithm development and implementation
Data Analysis: Comprehensive data processing and analysis
Hypothesis Testing: Automated research workflows and validation
Literature Review: Intelligent research paper analysis and synthesis

Getting Started

MiniMax M2 cloud model is available through various API providers. For more information:

API Documentation: MiniMax M2 API Guide
Model Information: MiniMax M2 Technical Report
Developer Resources: MiniMax Developer Portal
Playground: Test MiniMax M2 capabilities in the interactive playground
Community: Join the MiniMax community for support and use case sharing

Moonshot AI / kimi-k2-thinking

Kimi K2 Thinking: Advanced reasoning model with test-time scaling, 256K context, and state-of-the-art agentic capabilities. Cloud-optimized for complex problem solving.

Mistral AI / mistral-large-3

Mistral Large 3: State-of-the-art multimodal MoE model with 256K context, Apache 2.0 license. Cloud-optimized for production-grade enterprise workloads.