Moonshot AI / kimi-k2-thinking

Kimi K2 Thinking: Advanced reasoning model with test-time scaling, 256K context, and state-of-the-art agentic capabilities. Cloud-optimized for complex problem solving.

Kimi K2 Thinking Architecture

Models

Model	Size	Context Length	Input Modalities	Activated Parameters
kimi-k2-thinking	-	256K	Text	-

Kimi K2 Thinking: Advanced Reasoning Agent

Kimi K2 Thinking represents Moonshot AI's most advanced open-source thinking model, designed as a reasoning agent that excels at complex problem solving through step-by-step reasoning and tool integration. With state-of-the-art performance on benchmarks like Humanity's Last Exam (HLE) and BrowseComp, Kimi K2 Thinking demonstrates major gains in reasoning, agentic search, coding, writing, and general capabilities.

Key Features

Test-Time Scaling: Scales both thinking tokens and tool calling steps for complex problem solving
Long-Horizon Reasoning: Executes 200-300 sequential tool calls without human interference
State-of-the-Art Performance: Leading scores on HLE, BrowseComp, and SWE-Bench Verified
Advanced Agentic Capabilities: Fluid integration with software agents for complex workflows
Extended Context: 256K token context window for comprehensive problem understanding
Dynamic Reasoning Cycles: Think → search → browser use → code cycles for hypothesis refinement
Multi-Step Problem Solving: Decomposes ambiguous problems into clear, actionable subtasks

Model Variants

Name	Size	Context	Input Modalities	Description
kimi-k2-thinking	-	256K	Text	Cloud-optimized advanced reasoning agent

Technical Capabilities

Advanced Reasoning Architecture

Kimi K2 Thinking leverages test-time scaling technology:

Thinking Token Scaling: Generates comprehensive reasoning chains
Tool Call Scaling: Executes 200-300 sequential tool calls
Dynamic Reasoning Cycles: Think → search → browser → code workflows
Hypothesis Refinement: Continuous evidence verification and reasoning
Long-Horizon Planning: Complex, multi-step task execution

Agentic Intelligence

Agentic Coding: Substantial gains in software development tasks
Agentic Search: Superior web-based reasoning and information retrieval
Tool Integration: Seamless integration with development and productivity tools
Adaptive Reasoning: Dynamic problem decomposition and solution refinement
Evidence-Based Decision Making: Comprehensive evidence tracking and verification

Domain-Specific Excellence

Coding: State-of-the-art performance on SWE-Bench, Terminal-Bench, and multi-language tasks
Search: Leading performance on BrowseComp and web-based reasoning tasks
Writing: Enhanced creative and practical writing capabilities
Reasoning: Advanced performance on mathematical and logical reasoning benchmarks

Benchmark Performance

Kimi K2 Thinking sets new records across reasoning, coding, and agentic benchmarks:

Reasoning Tasks

Benchmark	K2 Thinking	GPT-5	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2
Humanity's Last Exam (Text)	23.9	26.3	19.8	7.9	25.4
Humanity's Last Exam (Tools)	44.9	41.7	32.0	21.7	41.0
AIME 2025 (Text)	94.5	94.6	87.0	51.0	91.7
AIME 2025 (Python)	99.1	99.6	100.0	75.2	98.8
GPQA-Diamond	84.5	85.7	83.4	74.2	87.5

Coding Tasks

Benchmark	K2 Thinking	GPT-5	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2
SWE-Bench Verified	71.3	74.9	77.2	69.2	67.8
SWE-Bench Multilingual	61.1	55.3	68.0	55.9	57.9
Terminal-Bench	47.1	43.8	51.0	44.5	37.7
LiveCodeBench v6	83.1	87.0	64.0	56.1	74.1

Agentic Search Tasks

Benchmark	K2 Thinking	GPT-5	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2
BrowseComp	60.2	54.9	24.1	7.4	40.1
BrowseComp-ZH	62.3	63.0	42.4	22.2	47.9
Seal-0	56.3	51.4	53.4	25.2	38.5

General Tasks

Benchmark	K2 Thinking	GPT-5	Claude Sonnet 4.5	K2 0905	DeepSeek-V3.2
MMLU-Pro	84.6	87.1	87.5	81.9	-
Longform Writing	73.8	71.4	79.8	62.8	-
HealthBench	58.0	67.2	44.2	43.8	-

Use Cases

Complex Problem Solving

Scientific Research: Advanced hypothesis generation and testing
Mathematical Reasoning: Complex equation solving and proof development
Technical Analysis: Comprehensive system analysis and optimization
Strategic Planning: Multi-step scenario analysis and decision making

Software Development

Agentic Coding: End-to-end software development workflows
Frontend Development: Responsive, functional web interfaces from concepts
Multi-Language Development: Consistent performance across programming languages
Debugging: Advanced error detection and resolution
Codebase Modernization: Intelligent refactoring and optimization

Web-Based Research

Information Retrieval: Advanced web search and information synthesis
Research Assistance: Comprehensive literature review and analysis
Data Collection: Automated web-based data gathering
Evidence Verification: Cross-source information validation
Dynamic Reasoning: Real-time hypothesis testing and refinement

Content Creation

Creative Writing: Vivid, imaginative storytelling and poetry
Technical Writing: Comprehensive documentation and reports
Academic Writing: Rigorous, logically coherent research papers
Professional Writing: Business documents and strategic communications
Personal Writing: Empathetic, nuanced personal communications

Enterprise Applications

Business Intelligence: Data-driven decision support
Process Automation: Complex workflow automation
Customer Support: Advanced conversational AI solutions
Knowledge Management: Comprehensive information synthesis and retrieval
Strategic Analysis: Multi-dimensional business analysis

Technical Specifications

Specification	Details
Context Window	256K tokens
Input Modalities	Text
Tool Call Capacity	200-300 sequential tool calls
Reasoning Approach	Test-time scaling with dynamic cycles
Agentic Integration	Seamless software agent integration
Performance	State-of-the-art on HLE, BrowseComp, SWE-Bench

Getting Started

Kimi K2 Thinking cloud model is available through various API providers. For more information:

API Documentation: Kimi K2 Thinking API Guide
Model Information: Kimi K2 Thinking Technical Report
Developer Resources: Moonshot AI Developer Portal
Playground: Test Kimi K2 Thinking capabilities in the interactive playground
Community: Join the Kimi community for support and use case sharing

Moonshot AI / kimi-k2

Kimi K2-Instruct-0905: State-of-the-art MoE model with 32B activated parameters and 256K context. Cloud-optimized for advanced coding and long-horizon agentic tasks.

MiniMax / minimax-m2

MiniMax M2: High-efficiency 230B parameter model with 200K context, optimized for coding and agentic workflows. Cloud-optimized with superior intelligence and agentic performance.