Google / gemma3

Gemma 3: Lightweight, multimodal models built on Gemini technology with 128K context window. Available as cloud-optimized models for text and vision tasks across 140+ languages.

Models

ModelSizeContext LengthInput Modalities
gemma3-4b4B32KText
gemma3-12b12B32KText
gemma3-27b27B128KText

Gemma 3: Lightweight Multimodal Models

Gemma 3 is a family of lightweight, state-of-the-art models from Google built on Gemini technology. These multimodal models process both text and images with a 128K context window and support over 140 languages. Available in 4B, 12B, and 27B parameter sizes, Gemma 3 models excel in tasks like question answering, summarization, reasoning, and multimodal understanding while maintaining efficient resource utilization.

Key Features

  • Multimodal Capabilities: Process both text and images for comprehensive understanding
  • Long Context Window: Up to 128K tokens for processing large documents and complex inputs
  • Multilingual Support: Native support for over 140 languages
  • Lightweight Design: Optimized for efficient deployment in cloud environments
  • Gemini Technology: Built on Google's advanced Gemini architecture
  • Versatile Applications: Excels in question answering, summarization, reasoning, and multimodal tasks

Model Variants

NameSizeContextInput ModalitiesDescription
gemma3-4b4B32KText4B parameter model
gemma3-12b12B32KText12B parameter model
gemma3-27b27B128KText27B parameter model

Technical Capabilities

Multimodal Understanding

Gemma 3 models excel at processing and reasoning about:

  • Text: Documents, code, natural language questions
  • Images: Photographs, diagrams, screenshots
  • Multimodal Combinations: Text with visual context

Language Support

  • Native support for over 140 languages
  • Strong performance across multilingual benchmarks
  • Optimized for global applications

Context Processing

  • Up to 128K token context window
  • Efficient processing of long documents
  • Repository-scale understanding capabilities

Benchmark Performance

Reasoning and Logic Capabilities

BenchmarkMetricgemma3-4bgemma3-12bgemma3-27b
HellaSwag10-shot77.284.285.6
BoolQ0-shot72.378.882.4
PIQA0-shot79.681.883.3
ARC-c25-shot56.268.970.6
MMLU5-shot59.674.578.6
GSM8K5-shot38.471.082.6
HumanEvalpass@136.045.748.8

Multilingual Capabilities

Benchmarkgemma3-4bgemma3-12bgemma3-27b
MGSM34.764.374.3
Global-MMLU-Lite57.069.475.7
Belebele59.478.0-
WMT24++ (ChrF)48.453.955.7

Multimodal Capabilities

Benchmarkgemma3-4bgemma3-12bgemma3-27b
COCOcap102111116
DocVQA (val)72.882.385.6
InfoVQA (val)44.154.859.4
MMMU (pt)39.250.356.1
TextVQA (val)58.966.568.6
AI2D63.275.279.0

Use Cases

Content Understanding

  • Document Processing: Extract and analyze information from complex documents
  • Multilingual Support: Provide language translation and localization services
  • Content Summarization: Generate concise summaries of long documents

Multimodal Applications

  • Visual Question Answering: Answer questions about images and diagrams
  • Document Analysis: Process forms, invoices, and technical documents
  • Educational Content: Create interactive learning materials with text and visuals

Development and Research

  • Code Assistance: Provide code completion and explanation
  • Research Support: Assist with literature review and data analysis
  • Prototyping: Accelerate development through rapid prototyping

Getting Started

Gemma 3 cloud models are available through various API providers. For more information: