Moonshot AI / kimi-k2-thinking
Models
| Model | Size | Context Length | Input Modalities | Activated Parameters |
|---|---|---|---|---|
| kimi-k2-thinking | - | 256K | Text | - |
Kimi K2 Thinking: Advanced Reasoning Agent
Kimi K2 Thinking represents Moonshot AI's most advanced open-source thinking model, designed as a reasoning agent that excels at complex problem solving through step-by-step reasoning and tool integration. With state-of-the-art performance on benchmarks like Humanity's Last Exam (HLE) and BrowseComp, Kimi K2 Thinking demonstrates major gains in reasoning, agentic search, coding, writing, and general capabilities.
Key Features
- Test-Time Scaling: Scales both thinking tokens and tool calling steps for complex problem solving
- Long-Horizon Reasoning: Executes 200-300 sequential tool calls without human interference
- State-of-the-Art Performance: Leading scores on HLE, BrowseComp, and SWE-Bench Verified
- Advanced Agentic Capabilities: Fluid integration with software agents for complex workflows
- Extended Context: 256K token context window for comprehensive problem understanding
- Dynamic Reasoning Cycles: Think → search → browser use → code cycles for hypothesis refinement
- Multi-Step Problem Solving: Decomposes ambiguous problems into clear, actionable subtasks
Model Variants
| Name | Size | Context | Input Modalities | Description |
|---|---|---|---|---|
| kimi-k2-thinking | - | 256K | Text | Cloud-optimized advanced reasoning agent |
Technical Capabilities
Advanced Reasoning Architecture
Kimi K2 Thinking leverages test-time scaling technology:
- Thinking Token Scaling: Generates comprehensive reasoning chains
- Tool Call Scaling: Executes 200-300 sequential tool calls
- Dynamic Reasoning Cycles: Think → search → browser → code workflows
- Hypothesis Refinement: Continuous evidence verification and reasoning
- Long-Horizon Planning: Complex, multi-step task execution
Agentic Intelligence
- Agentic Coding: Substantial gains in software development tasks
- Agentic Search: Superior web-based reasoning and information retrieval
- Tool Integration: Seamless integration with development and productivity tools
- Adaptive Reasoning: Dynamic problem decomposition and solution refinement
- Evidence-Based Decision Making: Comprehensive evidence tracking and verification
Domain-Specific Excellence
- Coding: State-of-the-art performance on SWE-Bench, Terminal-Bench, and multi-language tasks
- Search: Leading performance on BrowseComp and web-based reasoning tasks
- Writing: Enhanced creative and practical writing capabilities
- Reasoning: Advanced performance on mathematical and logical reasoning benchmarks
Benchmark Performance
Kimi K2 Thinking sets new records across reasoning, coding, and agentic benchmarks:
Reasoning Tasks
| Benchmark | K2 Thinking | GPT-5 | Claude Sonnet 4.5 | K2 0905 | DeepSeek-V3.2 |
|---|---|---|---|---|---|
| Humanity's Last Exam (Text) | 23.9 | 26.3 | 19.8 | 7.9 | 25.4 |
| Humanity's Last Exam (Tools) | 44.9 | 41.7 | 32.0 | 21.7 | 41.0 |
| AIME 2025 (Text) | 94.5 | 94.6 | 87.0 | 51.0 | 91.7 |
| AIME 2025 (Python) | 99.1 | 99.6 | 100.0 | 75.2 | 98.8 |
| GPQA-Diamond | 84.5 | 85.7 | 83.4 | 74.2 | 87.5 |
Coding Tasks
| Benchmark | K2 Thinking | GPT-5 | Claude Sonnet 4.5 | K2 0905 | DeepSeek-V3.2 |
|---|---|---|---|---|---|
| SWE-Bench Verified | 71.3 | 74.9 | 77.2 | 69.2 | 67.8 |
| SWE-Bench Multilingual | 61.1 | 55.3 | 68.0 | 55.9 | 57.9 |
| Terminal-Bench | 47.1 | 43.8 | 51.0 | 44.5 | 37.7 |
| LiveCodeBench v6 | 83.1 | 87.0 | 64.0 | 56.1 | 74.1 |
Agentic Search Tasks
| Benchmark | K2 Thinking | GPT-5 | Claude Sonnet 4.5 | K2 0905 | DeepSeek-V3.2 |
|---|---|---|---|---|---|
| BrowseComp | 60.2 | 54.9 | 24.1 | 7.4 | 40.1 |
| BrowseComp-ZH | 62.3 | 63.0 | 42.4 | 22.2 | 47.9 |
| Seal-0 | 56.3 | 51.4 | 53.4 | 25.2 | 38.5 |
General Tasks
| Benchmark | K2 Thinking | GPT-5 | Claude Sonnet 4.5 | K2 0905 | DeepSeek-V3.2 |
|---|---|---|---|---|---|
| MMLU-Pro | 84.6 | 87.1 | 87.5 | 81.9 | - |
| Longform Writing | 73.8 | 71.4 | 79.8 | 62.8 | - |
| HealthBench | 58.0 | 67.2 | 44.2 | 43.8 | - |
Use Cases
Complex Problem Solving
- Scientific Research: Advanced hypothesis generation and testing
- Mathematical Reasoning: Complex equation solving and proof development
- Technical Analysis: Comprehensive system analysis and optimization
- Strategic Planning: Multi-step scenario analysis and decision making
Software Development
- Agentic Coding: End-to-end software development workflows
- Frontend Development: Responsive, functional web interfaces from concepts
- Multi-Language Development: Consistent performance across programming languages
- Debugging: Advanced error detection and resolution
- Codebase Modernization: Intelligent refactoring and optimization
Web-Based Research
- Information Retrieval: Advanced web search and information synthesis
- Research Assistance: Comprehensive literature review and analysis
- Data Collection: Automated web-based data gathering
- Evidence Verification: Cross-source information validation
- Dynamic Reasoning: Real-time hypothesis testing and refinement
Content Creation
- Creative Writing: Vivid, imaginative storytelling and poetry
- Technical Writing: Comprehensive documentation and reports
- Academic Writing: Rigorous, logically coherent research papers
- Professional Writing: Business documents and strategic communications
- Personal Writing: Empathetic, nuanced personal communications
Enterprise Applications
- Business Intelligence: Data-driven decision support
- Process Automation: Complex workflow automation
- Customer Support: Advanced conversational AI solutions
- Knowledge Management: Comprehensive information synthesis and retrieval
- Strategic Analysis: Multi-dimensional business analysis
Technical Specifications
| Specification | Details |
|---|---|
| Context Window | 256K tokens |
| Input Modalities | Text |
| Tool Call Capacity | 200-300 sequential tool calls |
| Reasoning Approach | Test-time scaling with dynamic cycles |
| Agentic Integration | Seamless software agent integration |
| Performance | State-of-the-art on HLE, BrowseComp, SWE-Bench |
Getting Started
Kimi K2 Thinking cloud model is available through various API providers. For more information:
- API Documentation: Kimi K2 Thinking API Guide
- Model Information: Kimi K2 Thinking Technical Report
- Developer Resources: Moonshot AI Developer Portal
- Playground: Test Kimi K2 Thinking capabilities in the interactive playground
- Community: Join the Kimi community for support and use case sharing
Moonshot AI / kimi-k2
Kimi K2-Instruct-0905: State-of-the-art MoE model with 32B activated parameters and 256K context. Cloud-optimized for advanced coding and long-horizon agentic tasks.
MiniMax / minimax-m2
MiniMax M2: High-efficiency 230B parameter model with 200K context, optimized for coding and agentic workflows. Cloud-optimized with superior intelligence and agentic performance.