Google / gemma3
Models
| Model | Size | Context Length | Input Modalities |
|---|---|---|---|
| gemma3-4b | 4B | 32K | Text |
| gemma3-12b | 12B | 32K | Text |
| gemma3-27b | 27B | 128K | Text |
Gemma 3: Lightweight Multimodal Models
Gemma 3 is a family of lightweight, state-of-the-art models from Google built on Gemini technology. These multimodal models process both text and images with a 128K context window and support over 140 languages. Available in 4B, 12B, and 27B parameter sizes, Gemma 3 models excel in tasks like question answering, summarization, reasoning, and multimodal understanding while maintaining efficient resource utilization.
Key Features
- Multimodal Capabilities: Process both text and images for comprehensive understanding
- Long Context Window: Up to 128K tokens for processing large documents and complex inputs
- Multilingual Support: Native support for over 140 languages
- Lightweight Design: Optimized for efficient deployment in cloud environments
- Gemini Technology: Built on Google's advanced Gemini architecture
- Versatile Applications: Excels in question answering, summarization, reasoning, and multimodal tasks
Model Variants
| Name | Size | Context | Input Modalities | Description |
|---|---|---|---|---|
| gemma3-4b | 4B | 32K | Text | 4B parameter model |
| gemma3-12b | 12B | 32K | Text | 12B parameter model |
| gemma3-27b | 27B | 128K | Text | 27B parameter model |
Technical Capabilities
Multimodal Understanding
Gemma 3 models excel at processing and reasoning about:
- Text: Documents, code, natural language questions
- Images: Photographs, diagrams, screenshots
- Multimodal Combinations: Text with visual context
Language Support
- Native support for over 140 languages
- Strong performance across multilingual benchmarks
- Optimized for global applications
Context Processing
- Up to 128K token context window
- Efficient processing of long documents
- Repository-scale understanding capabilities
Benchmark Performance
Reasoning and Logic Capabilities
| Benchmark | Metric | gemma3-4b | gemma3-12b | gemma3-27b |
|---|---|---|---|---|
| HellaSwag | 10-shot | 77.2 | 84.2 | 85.6 |
| BoolQ | 0-shot | 72.3 | 78.8 | 82.4 |
| PIQA | 0-shot | 79.6 | 81.8 | 83.3 |
| ARC-c | 25-shot | 56.2 | 68.9 | 70.6 |
| MMLU | 5-shot | 59.6 | 74.5 | 78.6 |
| GSM8K | 5-shot | 38.4 | 71.0 | 82.6 |
| HumanEval | pass@1 | 36.0 | 45.7 | 48.8 |
Multilingual Capabilities
| Benchmark | gemma3-4b | gemma3-12b | gemma3-27b |
|---|---|---|---|
| MGSM | 34.7 | 64.3 | 74.3 |
| Global-MMLU-Lite | 57.0 | 69.4 | 75.7 |
| Belebele | 59.4 | 78.0 | - |
| WMT24++ (ChrF) | 48.4 | 53.9 | 55.7 |
Multimodal Capabilities
| Benchmark | gemma3-4b | gemma3-12b | gemma3-27b |
|---|---|---|---|
| COCOcap | 102 | 111 | 116 |
| DocVQA (val) | 72.8 | 82.3 | 85.6 |
| InfoVQA (val) | 44.1 | 54.8 | 59.4 |
| MMMU (pt) | 39.2 | 50.3 | 56.1 |
| TextVQA (val) | 58.9 | 66.5 | 68.6 |
| AI2D | 63.2 | 75.2 | 79.0 |
Use Cases
Content Understanding
- Document Processing: Extract and analyze information from complex documents
- Multilingual Support: Provide language translation and localization services
- Content Summarization: Generate concise summaries of long documents
Multimodal Applications
- Visual Question Answering: Answer questions about images and diagrams
- Document Analysis: Process forms, invoices, and technical documents
- Educational Content: Create interactive learning materials with text and visuals
Development and Research
- Code Assistance: Provide code completion and explanation
- Research Support: Assist with literature review and data analysis
- Prototyping: Accelerate development through rapid prototyping
Getting Started
Gemma 3 cloud models are available through various API providers. For more information:
- API Documentation: Gemma 3 API Guide
- Model Information: Gemma 3 Technical Report
- Community: Join the Gemma community for support and use case sharing
- Playground: Test Gemma 3 capabilities in the interactive playground
Google / gemini-3-pro-preview
Gemini 3 Pro: Google's most advanced model with 1M token context, state-of-the-art reasoning, and powerful agentic capabilities. Cloud-optimized for complex multimodal tasks.
Zhipu AI / glm-4.6
GLM-4.6: Advanced agentic model with 200K context window, superior coding performance, and enhanced reasoning capabilities. Cloud-optimized for complex tasks.