Google / gemma3

Gemma 3: Lightweight, multimodal models built on Gemini technology with 128K context window. Available as cloud-optimized models for text and vision tasks across 140+ languages.

Gemma 3 Architecture

Models

Model	Size	Context Length	Input Modalities
gemma3-4b	4B	32K	Text
gemma3-12b	12B	32K	Text
gemma3-27b	27B	128K	Text

Gemma 3: Lightweight Multimodal Models

Gemma 3 is a family of lightweight, state-of-the-art models from Google built on Gemini technology. These multimodal models process both text and images with a 128K context window and support over 140 languages. Available in 4B, 12B, and 27B parameter sizes, Gemma 3 models excel in tasks like question answering, summarization, reasoning, and multimodal understanding while maintaining efficient resource utilization.

Key Features

Multimodal Capabilities: Process both text and images for comprehensive understanding
Long Context Window: Up to 128K tokens for processing large documents and complex inputs
Multilingual Support: Native support for over 140 languages
Lightweight Design: Optimized for efficient deployment in cloud environments
Gemini Technology: Built on Google's advanced Gemini architecture
Versatile Applications: Excels in question answering, summarization, reasoning, and multimodal tasks

Model Variants

Name	Size	Context	Input Modalities	Description
gemma3-4b	4B	32K	Text	4B parameter model
gemma3-12b	12B	32K	Text	12B parameter model
gemma3-27b	27B	128K	Text	27B parameter model

Technical Capabilities

Multimodal Understanding

Gemma 3 models excel at processing and reasoning about:

Text: Documents, code, natural language questions
Images: Photographs, diagrams, screenshots
Multimodal Combinations: Text with visual context

Language Support

Native support for over 140 languages
Strong performance across multilingual benchmarks
Optimized for global applications

Context Processing

Up to 128K token context window
Efficient processing of long documents
Repository-scale understanding capabilities

Benchmark Performance

Reasoning and Logic Capabilities

Benchmark	Metric	gemma3-4b	gemma3-12b	gemma3-27b
HellaSwag	10-shot	77.2	84.2	85.6
BoolQ	0-shot	72.3	78.8	82.4
PIQA	0-shot	79.6	81.8	83.3
ARC-c	25-shot	56.2	68.9	70.6
MMLU	5-shot	59.6	74.5	78.6
GSM8K	5-shot	38.4	71.0	82.6
HumanEval	pass@1	36.0	45.7	48.8

Multilingual Capabilities

Benchmark	gemma3-4b	gemma3-12b	gemma3-27b
MGSM	34.7	64.3	74.3
Global-MMLU-Lite	57.0	69.4	75.7
Belebele	59.4	78.0	-
WMT24++ (ChrF)	48.4	53.9	55.7

Multimodal Capabilities

Benchmark	gemma3-4b	gemma3-12b	gemma3-27b
COCOcap	102	111	116
DocVQA (val)	72.8	82.3	85.6
InfoVQA (val)	44.1	54.8	59.4
MMMU (pt)	39.2	50.3	56.1
TextVQA (val)	58.9	66.5	68.6
AI2D	63.2	75.2	79.0

Use Cases

Content Understanding

Document Processing: Extract and analyze information from complex documents
Multilingual Support: Provide language translation and localization services
Content Summarization: Generate concise summaries of long documents

Multimodal Applications

Visual Question Answering: Answer questions about images and diagrams
Document Analysis: Process forms, invoices, and technical documents
Educational Content: Create interactive learning materials with text and visuals

Development and Research

Code Assistance: Provide code completion and explanation
Research Support: Assist with literature review and data analysis
Prototyping: Accelerate development through rapid prototyping

Getting Started

Gemma 3 cloud models are available through various API providers. For more information:

API Documentation: Gemma 3 API Guide
Model Information: Gemma 3 Technical Report
Community: Join the Gemma community for support and use case sharing
Playground: Test Gemma 3 capabilities in the interactive playground

Google / gemini-3-pro-preview

Gemini 3 Pro: Google's most advanced model with 1M token context, state-of-the-art reasoning, and powerful agentic capabilities. Cloud-optimized for complex multimodal tasks.

Zhipu AI / glm-4.6

GLM-4.6: Advanced agentic model with 200K context window, superior coding performance, and enhanced reasoning capabilities. Cloud-optimized for complex tasks.