Milvus
Open-source vector database for scalable similarity search and AI applications.
What is Milvus?
Milvus is an open-source vector database designed for scalable similarity search and AI applications. It provides a distributed, cloud-native architecture that enables efficient storage, indexing, and search of billion-scale vector datasets across multiple machines.
Key Concepts
Milvus Architecture
graph TD
A[Milvus] --> B[Access Layer]
A --> C[Coordinator Service]
A --> D[Worker Nodes]
A --> E[Storage Layer]
B --> B1[Client SDKs]
B --> B2[REST API]
B --> B3[gRPC]
C --> C1[Root Coordinator]
C --> C2[Query Coordinator]
C --> C3[Data Coordinator]
C --> C4[Index Coordinator]
D --> D1[Query Nodes]
D --> D2[Data Nodes]
D --> D3[Index Nodes]
E --> E1[Object Storage]
E --> E2[Metadata Storage]
E --> E3[Log Broker]
style A fill:#f9f,stroke:#333
Core Features
- Distributed Architecture: Horizontal scalability
- Multiple Index Types: Support for various ANN algorithms
- Hybrid Search: Combine vector and scalar filtering
- Cloud-Native: Kubernetes-native deployment
- Multi-Language Support: SDKs for multiple languages
- Time Travel: Query historical data
- Role-Based Access Control: Security and permissions
Approaches and Architecture
Deployment Models
| Model | Description | Use Case |
|---|---|---|
| Standalone | Single-node deployment | Development, testing |
| Distributed | Multi-node cluster | Production, large-scale |
| Cloud | Managed cloud service | Production, fully managed |
| Kubernetes | Kubernetes-native deployment | Cloud-native environments |
Index Types
| Index Type | Description | Use Case |
|---|---|---|
| FLAT | Brute-force exact search | Small datasets, exact search |
| IVF_FLAT | Inverted File with exact post-verification | Medium datasets |
| IVF_SQ8 | Inverted File with scalar quantization | Memory efficiency |
| IVF_PQ | Inverted File with Product Quantization | Large datasets |
| HNSW | Hierarchical Navigable Small World | High performance |
| ANNOY | Approximate Nearest Neighbors Oh Yeah | Approximate search |
| RNSG | Navigable Small World Graph | Graph-based search |
Data Model
classDiagram
class Collection {
+name: string
+dimension: int
+metric_type: string
+description: string
}
class Partition {
+name: string
+description: string
}
class Segment {
+id: int
+state: string
}
class Entity {
+id: int
+vector: float[]
+scalar_fields: map
+timestamp: int
}
Collection "1" --> "*" Partition
Partition "1" --> "*" Segment
Segment "1" --> "*" Entity
Mathematical Foundations
Similarity Search
Milvus supports multiple similarity metrics:
- L2 Distance: $d(x, y) = \sqrt{\sum_^d (x_i - y_i)^2}$
- Inner Product: $d(x, y) = \sum_^d x_i y_i$
- Cosine Similarity: $d(x, y) = \frac{\sum_^d x_i y_i}{\sqrt{\sum_^d x_i^2} \sqrt{\sum_^d y_i^2}}$
- Hamming Distance: For binary vectors
- Jaccard Distance: For set similarity
Index Optimization
Milvus optimizes search with:
$$Q = \text{Search}(q, k, \text{filter})$$
Where:
- $Q$ = result set
- $q$ = query vector
- $k$ = number of nearest neighbors
- $\text{filter}$ = scalar filtering condition
Applications
AI Systems
- Recommendation Systems: Personalized recommendations
- Semantic Search: Content-based search
- Image Search: Visual similarity search
- Video Analysis: Content-based video retrieval
- Audio Search: Sound similarity search
Enterprise Applications
- Customer 360: Holistic customer view
- Product Search: Visual and semantic product search
- Document Retrieval: Enterprise search
- Knowledge Management: Organizational knowledge bases
- Fraud Detection: Anomaly detection
Industry-Specific
- E-commerce: Product recommendations
- Healthcare: Medical image analysis
- Finance: Risk assessment
- Media: Content discovery
- Gaming: Player matching
Implementation
Basic Usage
from pymilvus import (
connections,
utility,
FieldSchema, CollectionSchema, DataType,
Collection
)
import numpy as np
# Connect to Milvus
connections.connect("default", host="localhost", port="19530")
# Define collection schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128),
FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=50),
FieldSchema(name="price", dtype=DataType.FLOAT),
FieldSchema(name="rating", dtype=DataType.FLOAT)
]
schema = CollectionSchema(fields, description="Example collection")
# Create collection
collection_name = "example_collection"
if utility.has_collection(collection_name):
utility.drop_collection(collection_name)
collection = Collection(collection_name, schema)
# Create index
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "L2",
"params": {"nlist": 100}
}
collection.create_index("embedding", index_params)
# Generate sample data
num_entities = 10000
entities = []
for i in range(num_entities):
entity = {
"id": i,
"embedding": np.random.rand(128).tolist(),
"category": f"category_{i % 10}",
"price": float(i % 100),
"rating": float(i % 5 + 1)
}
entities.append(entity)
# Insert data
insert_result = collection.insert(entities)
print(f"Inserted {len(insert_result.primary_keys)} entities")
# Load collection
collection.load()
# Search
query_vector = np.random.rand(128).tolist()
search_params = {
"metric_type": "L2",
"params": {"nprobe": 10}
}
results = collection.search(
data=[query_vector],
anns_field="embedding",
param=search_params,
limit=5,
output_fields=["id", "category", "price", "rating"]
)
# Process results
print("Search results:")
for hits in results:
for hit in hits:
print(f"ID: {hit.entity.get('id')}, Distance: {hit.distance:.4f}")
print(f"Category: {hit.entity.get('category')}, Price: {hit.entity.get('price')}")
print(f"Rating: {hit.entity.get('rating')}")
print()
Advanced Features
# Partitioning
partition_name = "summer_collection"
if not utility.has_partition(collection_name, partition_name):
collection.create_partition(partition_name)
# Insert into partition
summer_entities = []
for i in range(1000):
entity = {
"id": num_entities + i,
"embedding": np.random.rand(128).tolist(),
"category": f"summer_{i % 5}",
"price": float(i % 50 + 50),
"rating": float(i % 5 + 1)
}
summer_entities.append(entity)
collection.insert(summer_entities, partition_name=partition_name)
# Search with filtering
filter_expr = "category == 'summer_2' and price > 70 and rating >= 4"
results = collection.search(
data=[query_vector],
anns_field="embedding",
param=search_params,
limit=5,
expr=filter_expr,
output_fields=["id", "category", "price", "rating"]
)
# Hybrid search with multiple vectors
query_vectors = [np.random.rand(128).tolist() for _ in range(3)]
results = collection.search(
data=query_vectors,
anns_field="embedding",
param=search_params,
limit=3,
expr=filter_expr
)
# Time travel - search historical data
past_timestamp = utility.get_current_timestamp() - 3600 * 1000 # 1 hour ago
results = collection.search(
data=[query_vector],
anns_field="embedding",
param=search_params,
limit=5,
travel_timestamp=past_timestamp
)
Distributed Deployment
# Kubernetes deployment example (conceptual)
# This would be defined in YAML files for actual deployment
# milvus-cluster.yaml
apiVersion: milvus.io/v1alpha1
kind: Milvus
metadata:
name: milvus-cluster
spec:
mode: cluster
dependencies:
etcd:
inCluster:
values:
replicaCount: 3
storage:
inCluster:
values:
replicaCount: 4
components:
queryNode:
replicas: 3
dataNode:
replicas: 3
indexNode:
replicas: 2
proxy:
replicas: 2
config:
log:
level: info
common:
retentionDuration: 4320
Performance Optimization
Index Selection Guide
| Dataset Size | Dimensionality | Accuracy Requirement | Recommended Index | Parameters |
|---|---|---|---|---|
| Small (<10K) | Low (<32) | High | FLAT | - |
| Medium (10K-1M) | Medium (32-128) | Medium-High | IVF_FLAT | nlist=100-500 |
| Large (1M-100M) | High (128-512) | Medium | IVF_PQ | nlist=1000, m=8-16 |
| Very Large (>100M) | Very High (>512) | Low-Medium | HNSW | M=16-32, efConstruction=200 |
| Billions | Any | Any | Distributed IVF | Multiple nodes |
Query Optimization
- Index Parameters: Tune nlist, nprobe, M, efConstruction
- Search Parameters: Optimize nprobe, ef (for HNSW)
- Filtering: Use selective filters
- Batch Size: Optimize batch size for search/insert
- Partitioning: Use partitions for large collections
- Caching: Cache frequent queries
Benchmarking
import time
from pymilvus import utility
def benchmark_search(collection, query_vectors, k, expr=None, iterations=10):
"""Benchmark search performance"""
search_params = {
"metric_type": "L2",
"params": {"nprobe": 10}
}
# Warm-up
for _ in range(3):
collection.search(
data=query_vectors[:1],
anns_field="embedding",
param=search_params,
limit=k,
expr=expr
)
# Benchmark
start_time = time.time()
for _ in range(iterations):
results = collection.search(
data=query_vectors,
anns_field="embedding",
param=search_params,
limit=k,
expr=expr
)
total_time = time.time() - start_time
avg_time = total_time / iterations
qps = len(query_vectors) / avg_time
print(f"Benchmark results (k={k}):")
print(f" Average time: {avg_time:.4f}s")
print(f" Queries per second: {qps:.2f}")
print(f" Latency per query: {avg_time/len(query_vectors)*1000:.2f}ms")
return {
"avg_time": avg_time,
"qps": qps,
"latency_ms": avg_time/len(query_vectors)*1000
}
# Benchmark different configurations
query_vectors = [np.random.rand(128).tolist() for _ in range(100)]
# Benchmark FLAT index
collection = Collection("flat_collection")
collection.load()
flat_results = benchmark_search(collection, query_vectors, 5)
# Benchmark IVF_FLAT index
collection = Collection("ivf_flat_collection")
collection.load()
ivf_results = benchmark_search(collection, query_vectors, 5)
# Benchmark with filtering
filtered_results = benchmark_search(collection, query_vectors, 5, expr="price > 50")
# Compare results
print("\nComparison:")
print(f"FLAT: {flat_results['qps']:.2f} QPS, {flat_results['latency_ms']:.2f}ms latency")
print(f"IVF_FLAT: {ivf_results['qps']:.2f} QPS, {ivf_results['latency_ms']:.2f}ms latency")
print(f"IVF_FLAT (filtered): {filtered_results['qps']:.2f} QPS, {filtered_results['latency_ms']:.2f}ms latency")
Challenges
Technical Challenges
- Distributed Coordination: Managing distributed components
- Data Consistency: Ensuring consistency across nodes
- Index Construction: Time-consuming for large datasets
- Query Routing: Efficient query distribution
- Resource Management: Balancing resources across components
Practical Challenges
- Deployment Complexity: Setting up distributed clusters
- Monitoring: Tracking performance in distributed systems
- Scaling: Managing growth and resource allocation
- Upgrade Management: Handling version upgrades
- Data Migration: Moving data between clusters
Operational Challenges
- Resource Planning: Estimating resource requirements
- Cost Management: Optimizing infrastructure costs
- Disaster Recovery: Ensuring data durability
- Security: Managing access controls
- Compliance: Meeting regulatory requirements
Research and Advancements
Key Features
- "Milvus: A Purpose-Built Vector Data Management System" (2021)
- Introduced Milvus architecture
- Distributed vector search
- "Towards Billion-Scale Similarity Search" (2020)
- Scalable vector search techniques
- Foundation for Milvus scaling
- "Approximate Nearest Neighbor Search in High Dimensions" (2018)
- ANN algorithms in Milvus
- Performance optimization
Emerging Research Directions
- Adaptive Indexing: Indexes that adapt to data distribution
- Multi-Modal Search: Search across different data types
- Privacy-Preserving Search: Secure similarity search
- Explainable Search: Interpretable search results
- Real-Time Indexing: Instant index updates
- Edge Deployment: Local vector search
- Federated Search: Search across multiple clusters
- AutoML Integration: Automated machine learning pipelines
Best Practices
Design
- Collection Planning: Design collections for specific use cases
- Schema Design: Plan schema carefully
- Partition Strategy: Use partitions for large collections
- Index Selection: Choose appropriate index type
- Dimension Selection: Match vector dimensions to use case
Implementation
- Start Small: Begin with standalone deployment
- Iterative Development: Build incrementally
- Monitor Performance: Track query latency and throughput
- Optimize Indexes: Tune index parameters
- Use Partitions: Partition large collections
Production
- Scale Gradually: Monitor and scale as needed
- Implement Monitoring: Track system health
- Plan for Disaster Recovery: Implement backup strategies
- Secure Access: Implement access controls
- Optimize Queries: Tune query performance
Maintenance
- Update Regularly: Keep Milvus updated
- Monitor Indexes: Track index health
- Optimize Schema: Refine schema as requirements evolve
- Backup Data: Regularly backup important data
- Document Configuration: Document system configuration