MLflow
Open source platform for managing the end-to-end machine learning lifecycle.
What is MLflow?
MLflow is an open-source platform designed to manage the complete machine learning lifecycle. It provides tools for experiment tracking, model management, model deployment, and a central model registry. MLflow enables data scientists and machine learning engineers to track experiments, package code into reproducible runs, share and collaborate on models, and deploy models to production environments.
Key Concepts
MLflow Architecture
graph TD
A[MLflow] --> B[Tracking]
A --> C[Projects]
A --> D[Models]
A --> E[Model Registry]
A --> F[UI]
A --> G[API]
A --> H[Plugins]
B --> B1[Experiment Tracking]
B --> B2[Parameter Logging]
B --> B3[Metric Logging]
B --> B4[Artifact Storage]
B --> B5[Run Management]
C --> C1[Reproducible Runs]
C --> C2[Environment Management]
C --> C3[Dependency Management]
C --> C4[Entry Points]
C --> C5[Container Support]
D --> D1[Model Packaging]
D --> D2[Model Format]
D --> D3[Flavor System]
D --> D4[Deployment Tools]
D --> D5[Model Serving]
E --> E1[Model Versioning]
E --> E2[Model Stages]
E --> E3[Annotations]
E --> E4[Access Control]
E --> E5[Model Lineage]
F --> F1[Web Interface]
F --> F2[Visualization]
F --> F3[Comparison Tools]
F --> F4[Search Functionality]
G --> G1[REST API]
G --> G2[Python API]
G --> G3[R API]
G --> G4[Java API]
H --> H1[Custom Backends]
H --> H2[Custom Flavors]
H --> H3[Custom Stores]
style A fill:#FF6B6B,stroke:#333
style B fill:#4ECDC4,stroke:#333
style C fill:#45B7D1,stroke:#333
style D fill:#FFA07A,stroke:#333
style E fill:#98D8C8,stroke:#333
style F fill:#F7DC6F,stroke:#333
style G fill:#BB8FCE,stroke:#333
style H fill:#85C1E9,stroke:#333
Core Components
- Tracking: Record and query experiments including code, data, config, and results
- Projects: Package ML code in a reusable, reproducible format
- Models: Manage and deploy models from various ML libraries
- Model Registry: Centralized model store with versioning and stage transitions
- UI: Web-based interface for visualizing and comparing experiments
- API: Programmatic access to MLflow functionality
- Plugins: Extend MLflow with custom functionality
- Artifact Storage: Store and retrieve files associated with runs
- Backend Stores: Database backends for tracking data
- Flavors: Library-specific model serialization formats
Applications
Machine Learning Workflows
- Experiment Tracking: Log parameters, metrics, and artifacts
- Reproducibility: Package code and environments for reproducible runs
- Model Management: Standardize model packaging and deployment
- Collaboration: Share experiments and models across teams
- Model Deployment: Deploy models to various serving platforms
- Hyperparameter Tuning: Track and compare hyperparameter optimization
- Model Versioning: Manage multiple versions of models
- Model Monitoring: Track model performance in production
- CI/CD for ML: Integrate ML workflows with continuous integration
- MLOps: Implement machine learning operations practices
Industry Applications
- Healthcare: Clinical model development and deployment
- Finance: Risk modeling and fraud detection
- Retail: Demand forecasting and recommendation systems
- Manufacturing: Predictive maintenance and quality control
- Automotive: Autonomous vehicle model development
- Telecommunications: Network optimization and customer churn prediction
- Energy: Energy demand forecasting and grid optimization
- Agriculture: Crop yield prediction and precision farming
- Marketing: Customer segmentation and campaign optimization
- Technology: AI product development and deployment
Implementation
Basic MLflow Example
# Basic MLflow example
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt
print("Basic MLflow Example...")
# 1. Set up MLflow tracking
print("\n1. Setting up MLflow Tracking...")
# Set the tracking URI (local file system in this case)
mlflow.set_tracking_uri("file:///tmp/mlruns")
# Create or get an experiment
experiment_name = "Basic Logistic Regression"
try:
experiment_id = mlflow.create_experiment(experiment_name)
except:
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id
print(f"Experiment ID: {experiment_id}")
print(f"Experiment Name: {experiment_name}")
# 2. Start a run
print("\n2. Starting a Run...")
with mlflow.start_run(experiment_id=experiment_id) as run:
print(f"Run ID: {run.info.run_id}")
print(f"Experiment ID: {run.info.experiment_id}")
# 3. Prepare data
print("\n3. Preparing Data...")
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Log dataset parameters
mlflow.log_param("n_samples", len(X))
mlflow.log_param("n_features", X.shape[1])
mlflow.log_param("n_classes", len(np.unique(y)))
mlflow.log_param("test_size", 0.2)
# 4. Train a model
print("\n4. Training Model...")
model = LogisticRegression(max_iter=1000, random_state=42)
# Log model parameters
mlflow.log_param("model_type", "LogisticRegression")
mlflow.log_param("max_iter", 1000)
mlflow.log_param("random_state", 42)
# Train the model
model.fit(X_train, y_train)
# 5. Evaluate the model
print("\n5. Evaluating Model...")
y_pred = model.predict(X_test)
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("precision", precision)
mlflow.log_metric("recall", recall)
mlflow.log_metric("f1_score", f1)
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
# 6. Log artifacts
print("\n6. Logging Artifacts...")
# Create and log a feature importance plot
feature_importance = np.abs(model.coef_[0])
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title("Feature Importance")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.savefig("feature_importance.png")
plt.close()
mlflow.log_artifact("feature_importance.png")
# Create and log a confusion matrix
from sklearn.metrics import confusion_matrix
import seaborn as sns
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.savefig("confusion_matrix.png")
plt.close()
mlflow.log_artifact("confusion_matrix.png")
# 7. Log the model
print("\n7. Logging Model...")
mlflow.sklearn.log_model(model, "model")
# Log additional model information
mlflow.set_tag("framework", "scikit-learn")
mlflow.set_tag("model_type", "classifier")
mlflow.set_tag("dataset", "synthetic")
print("Run completed successfully!")
# 8. View the results
print("\n8. Viewing Results...")
print("To view the results, run the MLflow UI:")
print("mlflow ui --backend-store-uri file:///tmp/mlruns")
print("Then open http://localhost:5000 in your browser")
# 9. Programmatic access to runs
print("\n9. Programmatic Access to Runs...")
# Get the experiment
experiment = mlflow.get_experiment(experiment_id)
print(f"Experiment Name: {experiment.name}")
print(f"Experiment ID: {experiment.experiment_id}")
print(f"Artifact Location: {experiment.artifact_location}")
print(f"Lifecycle Stage: {experiment.lifecycle_stage}")
# Search for runs
runs = mlflow.search_runs(experiment_ids=[experiment_id])
print(f"\nFound {len(runs)} runs:")
print(runs[['run_id', 'metrics.accuracy', 'metrics.f1_score', 'params.model_type']])
# Get the best run by accuracy
best_run = runs.sort_values("metrics.accuracy", ascending=False).iloc[0]
print(f"\nBest Run ID: {best_run.run_id}")
print(f"Best Accuracy: {best_run['metrics.accuracy']:.4f}")
print(f"Best F1 Score: {best_run['metrics.f1_score']:.4f}")
Experiment Tracking Example
# Experiment tracking example with MLflow
import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt
import seaborn as sns
print("\nExperiment Tracking Example...")
# Set up MLflow
mlflow.set_tracking_uri("file:///tmp/mlruns")
# Create or get an experiment
experiment_name = "Model Comparison"
try:
experiment_id = mlflow.create_experiment(experiment_name)
except:
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id
print(f"Experiment ID: {experiment_id}")
# Prepare data
X, y = make_classification(
n_samples=2000,
n_features=15,
n_informative=10,
n_redundant=2,
n_classes=2,
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define models to compare
models = {
"Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
"Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
"SVM": SVC(probability=True, random_state=42)
}
# Define preprocessing
preprocessor = StandardScaler()
# Run experiments
for model_name, model in models.items():
print(f"\nTraining {model_name}...")
with mlflow.start_run(experiment_id=experiment_id):
# Log model name
mlflow.log_param("model", model_name)
# Create pipeline
pipeline = Pipeline([
('preprocessor', preprocessor),
('classifier', model)
])
# Train model
pipeline.fit(X_train, y_train)
# Make predictions
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_proba)
# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("precision", precision)
mlflow.log_metric("recall", recall)
mlflow.log_metric("f1_score", f1)
mlflow.log_metric("roc_auc", roc_auc)
# Log parameters
if hasattr(model, 'get_params'):
params = model.get_params()
for param_name, param_value in params.items():
mlflow.log_param(param_name, param_value)
# Log preprocessing
mlflow.log_param("preprocessing", "StandardScaler")
# Log model
mlflow.sklearn.log_model(pipeline, "model")
# Log feature importance if available
if hasattr(model, 'feature_importances_'):
feature_importances = model.feature_importances_
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importances)), feature_importances)
plt.title(f"Feature Importance - {model_name}")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.savefig(f"feature_importance_{model_name.replace(' ', '_')}.png")
plt.close()
mlflow.log_artifact(f"feature_importance_{model_name.replace(' ', '_')}.png")
# Log confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title(f"Confusion Matrix - {model_name}")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.savefig(f"confusion_matrix_{model_name.replace(' ', '_')}.png")
plt.close()
mlflow.log_artifact(f"confusion_matrix_{model_name.replace(' ', '_')}.png")
print(f" Accuracy: {accuracy:.4f}")
print(f" Precision: {precision:.4f}")
print(f" Recall: {recall:.4f}")
print(f" F1 Score: {f1:.4f}")
print(f" ROC AUC: {roc_auc:.4f}")
# Compare models
print("\nComparing Models...")
runs = mlflow.search_runs(experiment_ids=[experiment_id])
print("\nModel Comparison:")
print(runs[['run_id', 'params.model', 'metrics.accuracy', 'metrics.f1_score', 'metrics.roc_auc']])
# Find the best model by accuracy
best_run = runs.sort_values("metrics.accuracy", ascending=False).iloc[0]
print(f"\nBest Model: {best_run['params.model']}")
print(f"Best Accuracy: {best_run['metrics.accuracy']:.4f}")
print(f"Best F1 Score: {best_run['metrics.f1_score']:.4f}")
print(f"Best ROC AUC: {best_run['metrics.roc_auc']:.4f}")
# Visualize comparison
plt.figure(figsize=(12, 6))
sns.barplot(
x='params.model',
y='metrics.accuracy',
data=runs,
palette='viridis'
)
plt.title("Model Accuracy Comparison")
plt.ylabel("Accuracy")
plt.ylim(0, 1)
plt.savefig("model_comparison_accuracy.png")
plt.close()
plt.figure(figsize=(12, 6))
sns.barplot(
x='params.model',
y='metrics.f1_score',
data=runs,
palette='viridis'
)
plt.title("Model F1 Score Comparison")
plt.ylabel("F1 Score")
plt.ylim(0, 1)
plt.savefig("model_comparison_f1.png")
plt.close()
# Log comparison artifacts
with mlflow.start_run(experiment_id=experiment_id):
mlflow.log_artifact("model_comparison_accuracy.png")
mlflow.log_artifact("model_comparison_f1.png")
mlflow.set_tag("comparison", "true")
mlflow.log_param("compared_models", list(models.keys()))
Hyperparameter Tuning Example
# Hyperparameter tuning example with MLflow
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score, f1_score
from scipy.stats import randint, uniform
import matplotlib.pyplot as plt
print("\nHyperparameter Tuning Example...")
# Set up MLflow
mlflow.set_tracking_uri("file:///tmp/mlruns")
# Create or get an experiment
experiment_name = "Hyperparameter Tuning"
try:
experiment_id = mlflow.create_experiment(experiment_name)
except:
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id
print(f"Experiment ID: {experiment_id}")
# Prepare data
X, y = make_classification(
n_samples=2000,
n_features=15,
n_informative=10,
n_redundant=2,
n_classes=2,
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define the model
model = RandomForestClassifier(random_state=42)
# 1. Grid Search
print("\n1. Performing Grid Search...")
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [None, 10, 20, 30],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
with mlflow.start_run(experiment_id=experiment_id, run_name="Grid Search"):
# Log search method
mlflow.log_param("search_method", "grid_search")
mlflow.log_param("param_grid", str(param_grid))
# Perform grid search
grid_search = GridSearchCV(
estimator=model,
param_grid=param_grid,
cv=5,
scoring='accuracy',
n_jobs=-1,
verbose=1
)
grid_search.fit(X_train, y_train)
# Log best parameters and score
mlflow.log_params(grid_search.best_params_)
mlflow.log_metric("best_cv_score", grid_search.best_score_)
# Evaluate on test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
test_f1 = f1_score(y_test, y_pred)
mlflow.log_metric("test_accuracy", test_accuracy)
mlflow.log_metric("test_f1_score", test_f1)
# Log model
mlflow.sklearn.log_model(best_model, "model")
# Log feature importance
feature_importances = best_model.feature_importances_
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importances)), feature_importances)
plt.title("Feature Importance - Grid Search")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.savefig("feature_importance_grid_search.png")
plt.close()
mlflow.log_artifact("feature_importance_grid_search.png")
print(f"Grid Search Best Parameters: {grid_search.best_params_}")
print(f"Grid Search Best CV Score: {grid_search.best_score_:.4f}")
print(f"Grid Search Test Accuracy: {test_accuracy:.4f}")
print(f"Grid Search Test F1 Score: {test_f1:.4f}")
# 2. Random Search
print("\n2. Performing Random Search...")
param_dist = {
'n_estimators': randint(50, 200),
'max_depth': [None] + list(range(5, 30)),
'min_samples_split': randint(2, 20),
'min_samples_leaf': randint(1, 10),
'max_features': uniform(0.1, 0.9)
}
with mlflow.start_run(experiment_id=experiment_id, run_name="Random Search"):
# Log search method
mlflow.log_param("search_method", "random_search")
mlflow.log_param("n_iter", 50)
mlflow.log_param("param_dist", str(param_dist))
# Perform random search
random_search = RandomizedSearchCV(
estimator=model,
param_distributions=param_dist,
n_iter=50,
cv=5,
scoring='accuracy',
n_jobs=-1,
random_state=42,
verbose=1
)
random_search.fit(X_train, y_train)
# Log best parameters and score
mlflow.log_params(random_search.best_params_)
mlflow.log_metric("best_cv_score", random_search.best_score_)
# Evaluate on test set
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
test_f1 = f1_score(y_test, y_pred)
mlflow.log_metric("test_accuracy", test_accuracy)
mlflow.log_metric("test_f1_score", test_f1)
# Log model
mlflow.sklearn.log_model(best_model, "model")
# Log feature importance
feature_importances = best_model.feature_importances_
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importances)), feature_importances)
plt.title("Feature Importance - Random Search")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.savefig("feature_importance_random_search.png")
plt.close()
mlflow.log_artifact("feature_importance_random_search.png")
print(f"Random Search Best Parameters: {random_search.best_params_}")
print(f"Random Search Best CV Score: {random_search.best_score_:.4f}")
print(f"Random Search Test Accuracy: {test_accuracy:.4f}")
print(f"Random Search Test F1 Score: {test_f1:.4f}")
# 3. Bayesian Optimization (using hyperopt)
print("\n3. Performing Bayesian Optimization...")
try:
from hyperopt import fmin, tpe, hp, Trials, STATUS_OK
from hyperopt.pyll import scope
# Define search space
space = {
'n_estimators': scope.int(hp.quniform('n_estimators', 50, 200, 1)),
'max_depth': scope.int(hp.quniform('max_depth', 3, 30, 1)),
'min_samples_split': scope.int(hp.quniform('min_samples_split', 2, 20, 1)),
'min_samples_leaf': scope.int(hp.quniform('min_samples_leaf', 1, 10, 1)),
'max_features': hp.uniform('max_features', 0.1, 0.9),
'criterion': hp.choice('criterion', ['gini', 'entropy'])
}
# Objective function
def objective(params):
with mlflow.start_run(experiment_id=experiment_id, nested=True):
# Convert params to appropriate types
params = {
'n_estimators': int(params['n_estimators']),
'max_depth': int(params['max_depth']) if params['max_depth'] > 0 else None,
'min_samples_split': int(params['min_samples_split']),
'min_samples_leaf': int(params['min_samples_leaf']),
'max_features': params['max_features'],
'criterion': params['criterion']
}
# Log parameters
mlflow.log_params(params)
# Create and train model
model = RandomForestClassifier(
random_state=42,
**params
)
# Cross-validation
from sklearn.model_selection import cross_val_score
cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
# Log metrics
mean_cv_score = np.mean(cv_scores)
std_cv_score = np.std(cv_scores)
mlflow.log_metric("mean_cv_accuracy", mean_cv_score)
mlflow.log_metric("std_cv_accuracy", std_cv_score)
# Train on full training set
model.fit(X_train, y_train)
# Evaluate on test set
y_pred = model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
test_f1 = f1_score(y_test, y_pred)
mlflow.log_metric("test_accuracy", test_accuracy)
mlflow.log_metric("test_f1_score", test_f1)
# Log model
mlflow.sklearn.log_model(model, "model")
return {'loss': -mean_cv_score, 'status': STATUS_OK}
with mlflow.start_run(experiment_id=experiment_id, run_name="Bayesian Optimization"):
# Log search method
mlflow.log_param("search_method", "bayesian_optimization")
mlflow.log_param("max_evals", 50)
# Perform optimization
trials = Trials()
best = fmin(
fn=objective,
space=space,
algo=tpe.suggest,
max_evals=50,
trials=trials,
rstate=np.random.RandomState(42)
)
# Log best parameters
mlflow.log_params(best)
# Get the best trial
best_trial = sorted(trials.results, key=lambda x: x['loss'])[0]
mlflow.log_metric("best_cv_score", -best_trial['loss'])
print(f"Bayesian Optimization Best Parameters: {best}")
print(f"Bayesian Optimization Best CV Score: {-best_trial['loss']:.4f}")
except ImportError:
print("Hyperopt not available, skipping Bayesian Optimization")
# Compare all tuning methods
print("\nComparing Tuning Methods...")
runs = mlflow.search_runs(experiment_ids=[experiment_id])
tuning_runs = runs[runs['tags.mlflow.runName'].isin(["Grid Search", "Random Search", "Bayesian Optimization"])]
print("\nHyperparameter Tuning Comparison:")
print(tuning_runs[['run_id', 'tags.mlflow.runName', 'metrics.best_cv_score', 'metrics.test_accuracy', 'metrics.test_f1_score']])
# Visualize comparison
plt.figure(figsize=(12, 6))
sns.barplot(
x='tags.mlflow.runName',
y='metrics.test_accuracy',
data=tuning_runs,
palette='viridis'
)
plt.title("Hyperparameter Tuning - Test Accuracy Comparison")
plt.ylabel("Test Accuracy")
plt.ylim(0, 1)
plt.savefig("tuning_comparison_accuracy.png")
plt.close()
plt.figure(figsize=(12, 6))
sns.barplot(
x='tags.mlflow.runName',
y='metrics.test_f1_score',
data=tuning_runs,
palette='viridis'
)
plt.title("Hyperparameter Tuning - Test F1 Score Comparison")
plt.ylabel("Test F1 Score")
plt.ylim(0, 1)
plt.savefig("tuning_comparison_f1.png")
plt.close()
# Log comparison artifacts
with mlflow.start_run(experiment_id=experiment_id, run_name="Tuning Comparison"):
mlflow.log_artifact("tuning_comparison_accuracy.png")
mlflow.log_artifact("tuning_comparison_f1.png")
mlflow.set_tag("comparison", "true")
mlflow.log_param("tuning_methods", ["Grid Search", "Random Search", "Bayesian Optimization"])
Model Registry Example
# Model registry example with MLflow
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import time
print("\nModel Registry Example...")
# Set up MLflow
mlflow.set_tracking_uri("file:///tmp/mlruns")
# Create or get an experiment
experiment_name = "Model Registry Demo"
try:
experiment_id = mlflow.create_experiment(experiment_name)
except:
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id
print(f"Experiment ID: {experiment_id}")
# Prepare data
X, y = make_classification(
n_samples=2000,
n_features=15,
n_informative=10,
n_redundant=2,
n_classes=2,
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 1. Train and register multiple model versions
print("\n1. Training and Registering Model Versions...")
# Train and register version 1
with mlflow.start_run(experiment_id=experiment_id, run_name="Version 1"):
model = RandomForestClassifier(
n_estimators=100,
max_depth=10,
random_state=42
)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics and model
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
mlflow.log_param("version", "1.0")
mlflow.log_param("description", "Initial model version")
# Register the model
model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"
model_name = "ClassificationModel"
model_version = mlflow.register_model(model_uri, model_name)
print(f"Registered model version: {model_version.version}")
print(f"Model accuracy: {accuracy:.4f}")
print(f"Model F1 score: {f1:.4f}")
# Train and register version 2 (improved)
with mlflow.start_run(experiment_id=experiment_id, run_name="Version 2"):
model = RandomForestClassifier(
n_estimators=150,
max_depth=15,
min_samples_split=5,
random_state=42
)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics and model
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
mlflow.log_param("version", "2.0")
mlflow.log_param("description", "Improved model with better hyperparameters")
# Register the model
model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"
model_version = mlflow.register_model(model_uri, model_name)
print(f"Registered model version: {model_version.version}")
print(f"Model accuracy: {accuracy:.4f}")
print(f"Model F1 score: {f1:.4f}")
# Train and register version 3 (with feature engineering)
with mlflow.start_run(experiment_id=experiment_id, run_name="Version 3"):
# Add some feature engineering
X_train_eng = np.column_stack([X_train, X_train[:, :5] ** 2]) # Add squared features
X_test_eng = np.column_stack([X_test, X_test[:, :5] ** 2])
model = RandomForestClassifier(
n_estimators=200,
max_depth=20,
min_samples_leaf=2,
random_state=42
)
model.fit(X_train_eng, y_train)
# Evaluate
y_pred = model.predict(X_test_eng)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics and model
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
mlflow.log_param("version", "3.0")
mlflow.log_param("description", "Model with feature engineering")
mlflow.log_param("feature_engineering", "Added squared features for first 5 features")
# Register the model
model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"
model_version = mlflow.register_model(model_uri, model_name)
print(f"Registered model version: {model_version.version}")
print(f"Model accuracy: {accuracy:.4f}")
print(f"Model F1 score: {f1:.4f}")
# 2. List registered models
print("\n2. Listing Registered Models...")
client = mlflow.tracking.MlflowClient()
# List all registered models
registered_models = client.search_registered_models()
print(f"Found {len(registered_models)} registered models:")
for model in registered_models:
print(f"\nModel Name: {model.name}")
print(f"Creation Timestamp: {model.creation_timestamp}")
print(f"Last Updated Timestamp: {model.last_updated_timestamp}")
print(f"Description: {model.description}")
# List model versions
print("Model Versions:")
for version in model.latest_versions:
print(f" Version {version.version}:")
print(f" Run ID: {version.run_id}")
print(f" Current Stage: {version.current_stage}")
print(f" Source: {version.source}")
print(f" Creation Timestamp: {version.creation_timestamp}")
# 3. Transition model stages
print("\n3. Transitioning Model Stages...")
# Get the latest versions
model_versions = client.search_model_versions(f"name='{model_name}'")
print(f"Found {len(model_versions)} versions of model '{model_name}':")
for version in sorted(model_versions, key=lambda x: x.version):
print(f"Version {version.version}: {version.current_stage}")
# Transition version 1 to Staging
print("\nTransitioning Version 1 to Staging...")
client.transition_model_version_stage(
name=model_name,
version=1,
stage="Staging",
archive_existing_versions=False
)
# Transition version 2 to Production
print("Transitioning Version 2 to Production...")
client.transition_model_version_stage(
name=model_name,
version=2,
stage="Production",
archive_existing_versions=True # Archive current production version
)
# Transition version 3 to None (remove from stages)
print("Transitioning Version 3 to None...")
client.transition_model_version_stage(
name=model_name,
version=3,
stage="None"
)
# Verify stage transitions
print("\nVerifying Stage Transitions...")
model_versions = client.search_model_versions(f"name='{model_name}'")
for version in sorted(model_versions, key=lambda x: x.version):
print(f"Version {version.version}: {version.current_stage}")
# 4. Load and use a production model
print("\n4. Loading and Using Production Model...")
# Get the production model
production_models = client.get_latest_versions(model_name, stages=["Production"])
if production_models:
production_model = production_models[0]
print(f"Production Model: Version {production_model.version}")
# Load the model
model_uri = f"models:/{model_name}/{production_model.current_stage}"
loaded_model = mlflow.sklearn.load_model(model_uri)
# Make predictions
y_pred = loaded_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f"Production Model Accuracy: {accuracy:.4f}")
print(f"Production Model F1 Score: {f1:.4f}")
else:
print("No production model found")
# 5. Add model descriptions and tags
print("\n5. Adding Model Descriptions and Tags...")
# Add description to the model
client.update_registered_model(
name=model_name,
description="Random Forest classifier for binary classification tasks"
)
# Add tags to model versions
for version in model_versions:
if version.version == "1":
client.set_model_version_tag(
name=model_name,
version=version.version,
key="purpose",
value="baseline"
)
elif version.version == "2":
client.set_model_version_tag(
name=model_name,
version=version.version,
key="purpose",
value="production"
)
client.set_model_version_tag(
name=model_name,
version=version.version,
key="optimization",
value="hyperparameter_tuning"
)
elif version.version == "3":
client.set_model_version_tag(
name=model_name,
version=version.version,
key="purpose",
value="experiment"
)
client.set_model_version_tag(
name=model_name,
version=version.version,
key="feature_engineering",
value="true"
)
# Verify tags
print("\nVerifying Model Tags...")
for version in model_versions:
tags = client.get_model_version_tags(
name=model_name,
version=version.version
)
print(f"Version {version.version} Tags: {tags}")
# 6. Model comparison
print("\n6. Comparing Model Versions...")
# Get all versions with their metrics
version_metrics = []
for version in model_versions:
run = client.get_run(version.run_id)
metrics = run.data.metrics
version_metrics.append({
"version": version.version,
"stage": version.current_stage,
"accuracy": metrics.get("accuracy", 0),
"f1_score": metrics.get("f1_score", 0),
"run_id": version.run_id
})
# Create comparison DataFrame
import pandas as pd
comparison_df = pd.DataFrame(version_metrics)
print("\nModel Version Comparison:")
print(comparison_df[["version", "stage", "accuracy", "f1_score"]])
# Find the best model by accuracy
best_model = comparison_df.sort_values("accuracy", ascending=False).iloc[0]
print(f"\nBest Model: Version {best_model['version']}")
print(f"Best Accuracy: {best_model['accuracy']:.4f}")
print(f"Best F1 Score: {best_model['f1_score']:.4f}")
print(f"Current Stage: {best_model['stage']}")
# 7. Model serving
print("\n7. Model Serving...")
# Serve the production model
print("To serve the production model, you can use:")
print(f"mlflow models serve -m models:/{model_name}/Production -p 1234")
print("\nThen you can send requests to the model:")
print("curl -X POST -H \"Content-Type:application/json\" --data '{\"inputs\":[[...]]}' http://localhost:1234/invocations")
# Example of how to prepare input for the API
sample_input = X_test[:1].tolist()
print(f"\nSample input for API: {sample_input}")
# 8. Model monitoring setup
print("\n8. Model Monitoring Setup...")
# Add monitoring tags to the production model
if production_models:
production_model = production_models[0]
client.set_model_version_tag(
name=model_name,
version=production_model.version,
key="monitoring",
value="enabled"
)
client.set_model_version_tag(
name=model_name,
version=production_model.version,
key="monitoring.metrics",
value="accuracy,f1_score,latency"
)
print(f"Added monitoring tags to production model version {production_model.version}")
# 9. Model deprecation
print("\n9. Model Deprecation...")
# Create a new version that will replace the current production model
with mlflow.start_run(experiment_id=experiment_id, run_name="Version 4"):
# Train an even better model
model = RandomForestClassifier(
n_estimators=250,
max_depth=25,
min_samples_split=4,
min_samples_leaf=2,
max_features=0.8,
random_state=42
)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics and model
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
mlflow.log_param("version", "4.0")
mlflow.log_param("description", "Improved model with optimized hyperparameters")
# Register the model
model_uri = f"runs:/{mlflow.active_run().info.run_id}/model"
model_version = mlflow.register_model(model_uri, model_name)
print(f"Registered model version: {model_version.version}")
print(f"Model accuracy: {accuracy:.4f}")
print(f"Model F1 score: {f1:.4f}")
# Transition the new model to Staging for testing
client.transition_model_version_stage(
name=model_name,
version=4,
stage="Staging"
)
print("\nModel version 4 has been registered and transitioned to Staging")
print("After testing, it can be promoted to Production and the old production model can be archived")
MLflow Projects Example
# MLflow Projects example
import mlflow
import mlflow.projects
import os
import tempfile
import shutil
print("\nMLflow Projects Example...")
# 1. Create a simple MLflow project
print("\n1. Creating an MLflow Project...")
# Create a temporary directory for the project
project_dir = tempfile.mkdtemp()
print(f"Created project directory: {project_dir}")
# Create MLproject file
mlproject_content = """
name: Example Project
conda_env: conda.yaml
entry_points:
main:
parameters:
alpha: {type: float, default: 0.5}
l1_ratio: {type: float, default: 0.1}
max_iter: {type: int, default: 100}
command: "python train.py --alpha {alpha} --l1_ratio {l1_ratio} --max_iter {max_iter}"
evaluate:
parameters:
model_uri: {type: string}
command: "python evaluate.py --model_uri {model_uri}"
"""
with open(os.path.join(project_dir, "MLproject"), "w") as f:
f.write(mlproject_content)
# Create conda environment file
conda_content = """
name: mlflow-example
channels:
- conda-forge
dependencies:
- python=3.8
- pip
- pip:
- mlflow
- scikit-learn
- numpy
- pandas
"""
with open(os.path.join(project_dir, "conda.yaml"), "w") as f:
f.write(conda_content)
# Create training script
train_script = """
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import argparse
def main():
# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument("--alpha", type=float, default=0.5)
parser.add_argument("--l1_ratio", type=float, default=0.1)
parser.add_argument("--max_iter", type=int, default=100)
args = parser.parse_args()
# Prepare data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Start MLflow run
with mlflow.start_run():
# Log parameters
mlflow.log_param("alpha", args.alpha)
mlflow.log_param("l1_ratio", args.l1_ratio)
mlflow.log_param("max_iter", args.max_iter)
# Train model
model = ElasticNet(
alpha=args.alpha,
l1_ratio=args.l1_ratio,
max_iter=args.max_iter,
random_state=42
)
model.fit(X_train, y_train)
# Evaluate model
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
# Log metrics
mlflow.log_metric("train_r2_score", train_score)
mlflow.log_metric("test_r2_score", test_score)
# Log model
mlflow.sklearn.log_model(model, "model")
print(f"Training completed with test R2 score: {test_score:.4f}")
if __name__ == "__main__":
main()
"""
with open(os.path.join(project_dir, "train.py"), "w") as f:
f.write(train_script)
# Create evaluation script
evaluate_script = """
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import argparse
def main():
# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument("--model_uri", type=str, required=True)
args = parser.parse_args()
# Prepare data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
_, X_test, _, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Load model
model = mlflow.sklearn.load_model(args.model_uri)
# Evaluate model
test_score = model.score(X_test, y_test)
# Start MLflow run
with mlflow.start_run():
mlflow.log_param("model_uri", args.model_uri)
mlflow.log_metric("test_r2_score", test_score)
print(f"Evaluation completed with test R2 score: {test_score:.4f}")
if __name__ == "__main__":
main()
"""
with open(os.path.join(project_dir, "evaluate.py"), "w") as f:
f.write(evaluate_script)
print("MLflow project created with:")
print(" • MLproject file")
print(" • conda.yaml environment specification")
print(" • train.py entry point")
print(" • evaluate.py entry point")
# 2. Run the project locally
print("\n2. Running the Project Locally...")
# Set tracking URI
mlflow.set_tracking_uri("file:///tmp/mlruns")
# Run the training entry point
print("\nRunning training...")
submitted_run = mlflow.projects.run(
uri=project_dir,
entry_point="main",
parameters={
"alpha": 0.3,
"l1_ratio": 0.2,
"max_iter": 200
},
experiment_name="MLflow Projects Example"
)
print(f"Run submitted with ID: {submitted_run.run_id}")
# Wait for the run to complete
print("Waiting for run to complete...")
submitted_run.wait()
# Get the run details
run = mlflow.get_run(submitted_run.run_id)
print(f"Run status: {run.info.status}")
print(f"Run metrics: {run.data.metrics}")
# 3. Run the evaluation entry point
print("\n3. Running Evaluation...")
# Get the model URI from the training run
model_uri = f"runs:/{submitted_run.run_id}/model"
# Run the evaluation
evaluate_run = mlflow.projects.run(
uri=project_dir,
entry_point="evaluate",
parameters={
"model_uri": model_uri
},
experiment_name="MLflow Projects Example"
)
print(f"Evaluation run submitted with ID: {evaluate_run.run_id}")
evaluate_run.wait()
# Get the evaluation run details
evaluate_run_details = mlflow.get_run(evaluate_run.run_id)
print(f"Evaluation run status: {evaluate_run_details.info.status}")
print(f"Evaluation run metrics: {evaluate_run_details.data.metrics}")
# 4. Run the project with different parameters
print("\n4. Running Project with Different Parameters...")
# Run with different hyperparameters
run2 = mlflow.projects.run(
uri=project_dir,
entry_point="main",
parameters={
"alpha": 0.7,
"l1_ratio": 0.3,
"max_iter": 150
},
experiment_name="MLflow Projects Example"
)
print(f"Second run submitted with ID: {run2.run_id}")
run2.wait()
# Compare runs
runs = mlflow.search_runs(experiment_ids=[run.info.experiment_id])
print("\nRun Comparison:")
print(runs[['run_id', 'params.alpha', 'params.l1_ratio', 'params.max_iter', 'metrics.test_r2_score']])
# 5. Run the project from GitHub
print("\n5. Running Project from GitHub...")
print("To run this project from GitHub, you would use:")
print("mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=0.5")
print("\nOr for a specific version:")
print("mlflow run https://github.com/mlflow/mlflow-example.git#examples/sklearn_elasticnet_wine -P alpha=0.5")
# 6. Clean up
print("\n6. Cleaning Up...")
shutil.rmtree(project_dir)
print(f"Removed project directory: {project_dir}")
MLflow Models Example
# MLflow Models example
import mlflow
import mlflow.sklearn
import mlflow.pyfunc
import mlflow.pytorch
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt
print("\nMLflow Models Example...")
# Set up MLflow
mlflow.set_tracking_uri("file:///tmp/mlruns")
# Create or get an experiment
experiment_name = "MLflow Models Demo"
try:
experiment_id = mlflow.create_experiment(experiment_name)
except:
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id
print(f"Experiment ID: {experiment_id}")
# Prepare data
X, y = make_classification(
n_samples=2000,
n_features=10,
n_classes=2,
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 1. Scikit-learn model
print("\n1. Scikit-learn Model...")
with mlflow.start_run(experiment_id=experiment_id, run_name="Scikit-learn Model"):
# Train model
model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
# Log model using scikit-learn flavor
mlflow.sklearn.log_model(model, "sklearn_model")
# Log model using pyfunc flavor
mlflow.pyfunc.log_model(
artifact_path="pyfunc_model",
python_model=model,
conda_env="conda.yaml"
)
print(f"Scikit-learn model logged with accuracy: {accuracy:.4f}")
# 2. PyTorch model
print("\n2. PyTorch Model...")
# Define a simple PyTorch model
class SimpleNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return torch.sigmoid(x)
# Prepare data for PyTorch
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train).view(-1, 1)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.FloatTensor(y_test).view(-1, 1)
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
with mlflow.start_run(experiment_id=experiment_id, run_name="PyTorch Model"):
# Initialize model
input_size = X_train.shape[1]
hidden_size = 64
output_size = 1
model = SimpleNN(input_size, hidden_size, output_size)
# Define loss and optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Log parameters
mlflow.log_param("input_size", input_size)
mlflow.log_param("hidden_size", hidden_size)
mlflow.log_param("output_size", output_size)
mlflow.log_param("learning_rate", 0.001)
mlflow.log_param("batch_size", 32)
mlflow.log_param("epochs", 50)
# Train model
model.train()
for epoch in range(50):
for batch_X, batch_y in train_loader:
optimizer.zero_grad()
outputs = model(batch_X)
loss = criterion(outputs, batch_y)
loss.backward()
optimizer.step()
# Log training loss
mlflow.log_metric("train_loss", loss.item(), step=epoch)
# Evaluate
model.eval()
with torch.no_grad():
outputs = model(X_test_tensor)
predicted = (outputs > 0.5).float()
accuracy = accuracy_score(y_test, predicted.numpy())
f1 = f1_score(y_test, predicted.numpy())
# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
# Log model using PyTorch flavor
mlflow.pytorch.log_model(model, "pytorch_model")
# Log model using pyfunc flavor
class PyTorchWrapper(mlflow.pyfunc.PythonModel):
def __init__(self, model):
self.model = model
def predict(self, context, model_input):
input_tensor = torch.FloatTensor(model_input.values if isinstance(model_input, pd.DataFrame) else model_input)
with torch.no_grad():
outputs = self.model(input_tensor)
return (outputs.numpy() > 0.5).astype(int)
mlflow.pyfunc.log_model(
artifact_path="pyfunc_pytorch_model",
python_model=PyTorchWrapper(model),
conda_env="conda.yaml"
)
print(f"PyTorch model logged with accuracy: {accuracy:.4f}")
# 3. Custom PyFunc model
print("\n3. Custom PyFunc Model...")
class CustomModel(mlflow.pyfunc.PythonModel):
def __init__(self):
self.model = LogisticRegression(max_iter=1000, random_state=42)
def load_context(self, context):
# Load any artifacts if needed
pass
def predict(self, context, model_input):
# Convert input to numpy array if it's a DataFrame
if isinstance(model_input, pd.DataFrame):
model_input = model_input.values
# Make predictions
return self.model.predict_proba(model_input)[:, 1]
with mlflow.start_run(experiment_id=experiment_id, run_name="Custom PyFunc Model"):
# Train the underlying model
model = CustomModel()
model.model.fit(X_train, y_train)
# Evaluate
y_pred_proba = model.model.predict_proba(X_test)[:, 1]
y_pred = (y_pred_proba > 0.5).astype(int)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
# Log the custom model
mlflow.pyfunc.log_model(
artifact_path="custom_model",
python_model=model,
conda_env="conda.yaml"
)
print(f"Custom PyFunc model logged with accuracy: {accuracy:.4f}")
# 4. Model flavors comparison
print("\n4. Model Flavors Comparison...")
print("MLflow supports multiple model flavors:")
print("1. Framework-specific flavors (sklearn, pytorch, tensorflow, etc.)")
print(" - Preserve the original model format")
print(" - Enable framework-specific functionality")
print(" - Best for development and framework-specific deployment")
print("2. PyFunc flavor")
print(" - Standardized interface for inference")
print(" - Framework-agnostic")
print(" - Best for production deployment and serving")
print("3. Custom flavors")
print(" - Extend MLflow with custom model types")
print(" - Support for specialized use cases")
# 5. Loading and using models
print("\n5. Loading and Using Models...")
# Get the latest runs
runs = mlflow.search_runs(experiment_ids=[experiment_id])
print("Available runs:")
for _, run in runs.iterrows():
print(f"Run ID: {run.run_id}, Run Name: {run['tags.mlflow.runName']}")
# Load scikit-learn model
print("\nLoading Scikit-learn Model...")
sklearn_run = runs[runs['tags.mlflow.runName'] == "Scikit-learn Model"].iloc[0]
sklearn_model_uri = f"runs:/{sklearn_run.run_id}/sklearn_model"
sklearn_model = mlflow.sklearn.load_model(sklearn_model_uri)
# Make predictions
sklearn_pred = sklearn_model.predict(X_test)
sklearn_accuracy = accuracy_score(y_test, sklearn_pred)
print(f"Scikit-learn model accuracy: {sklearn_accuracy:.4f}")
# Load PyTorch model
print("\nLoading PyTorch Model...")
pytorch_run = runs[runs['tags.mlflow.runName'] == "PyTorch Model"].iloc[0]
pytorch_model_uri = f"runs:/{pytorch_run.run_id}/pytorch_model"
pytorch_model = mlflow.pytorch.load_model(pytorch_model_uri)
# Make predictions
X_test_tensor = torch.FloatTensor(X_test)
with torch.no_grad():
pytorch_pred = (pytorch_model(X_test_tensor) > 0.5).float().numpy()
pytorch_accuracy = accuracy_score(y_test, pytorch_pred)
print(f"PyTorch model accuracy: {pytorch_accuracy:.4f}")
# Load PyFunc model
print("\nLoading PyFunc Model...")
pyfunc_run = runs[runs['tags.mlflow.runName'] == "Custom PyFunc Model"].iloc[0]
pyfunc_model_uri = f"runs:/{pyfunc_run.run_id}/custom_model"
pyfunc_model = mlflow.pyfunc.load_model(pyfunc_model_uri)
# Make predictions
pyfunc_pred = pyfunc_model.predict(X_test)
pyfunc_pred_class = (pyfunc_pred > 0.5).astype(int)
pyfunc_accuracy = accuracy_score(y_test, pyfunc_pred_class)
print(f"PyFunc model accuracy: {pyfunc_accuracy:.4f}")
# 6. Model serving
print("\n6. Model Serving...")
print("To serve a model, you can use the MLflow CLI:")
print(f"mlflow models serve -m {pyfunc_model_uri} -p 1234")
print("\nOnce served, you can send requests to the model:")
print("curl -X POST -H \"Content-Type:application/json\" --data '{\"inputs\":[[...]]}' http://localhost:1234/invocations")
# Example of how to prepare input for the API
sample_input = X_test[:3].tolist()
print(f"\nSample input for API: {sample_input}")
# 7. Model deployment options
print("\n7. Model Deployment Options...")
deployment_options = [
{
"name": "Local Serving",
"command": "mlflow models serve -m <model_uri> -p <port>",
"use_case": "Development, testing, local deployment",
"pros": ["Easy to set up", "Good for development", "No infrastructure needed"],
"cons": ["Not scalable", "No load balancing", "Manual management"]
},
{
"name": "Docker Container",
"command": "mlflow models build-docker -m <model_uri> -n <image_name>",
"use_case": "Containerized deployment, cloud environments",
"pros": ["Portable", "Scalable", "Consistent environment"],
"cons": ["Requires Docker", "Container management overhead"]
},
{
"name": "SageMaker",
"command": "mlflow sagemaker deploy -m <model_uri> --app-name <app_name>",
"use_case": "AWS production deployment",
"pros": ["Managed service", "Auto-scaling", "Integration with AWS ecosystem"],
"cons": ["AWS-specific", "Cost considerations"]
},
{
"name": "Azure ML",
"command": "mlflow azureml deploy -m <model_uri> --workspace-name <ws_name>",
"use_case": "Azure production deployment",
"pros": ["Managed service", "Integration with Azure ecosystem", "Enterprise features"],
"cons": ["Azure-specific", "Complex setup"]
},
{
"name": "Kubernetes",
"command": "mlflow models build-docker -m <model_uri> -n <image_name> && kubectl apply -f k8s_deployment.yaml",
"use_case": "Large-scale production deployment",
"pros": ["Highly scalable", "Fault-tolerant", "Resource efficient"],
"cons": ["Complex setup", "Requires Kubernetes expertise"]
},
{
"name": "Databricks",
"command": "mlflow models deploy -m <model_uri> --target databricks",
"use_case": "Databricks environment deployment",
"pros": ["Tight integration with Databricks", "Managed service", "Collaborative environment"],
"cons": ["Databricks-specific", "Cost considerations"]
}
]
print("Model Deployment Options:")
for option in deployment_options:
print(f"\n{option['name']}:")
print(f" Command: {option['command']}")
print(f" Use Case: {option['use_case']}")
print(f" Pros: {', '.join(option['pros'])}")
print(f" Cons: {', '.join(option['cons'])}")
# 8. Model signature and input example
print("\n8. Model Signature and Input Example...")
# Define a model signature
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec
input_schema = Schema([
TensorSpec(np.dtype(np.float32), (-1, 10), name="input_features")
])
output_schema = Schema([
TensorSpec(np.dtype(np.float32), (-1,), name="predicted_probability")
])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)
print("Model Signature:")
print(f"Inputs: {signature.inputs}")
print(f"Outputs: {signature.outputs}")
# Log a model with signature
with mlflow.start_run(experiment_id=experiment_id, run_name="Model with Signature"):
model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train, y_train)
# Log model with signature
mlflow.sklearn.log_model(
sk_model=model,
artifact_path="model_with_signature",
signature=signature
)
print("Model logged with signature")
# 9. Model explainability
print("\n9. Model Explainability...")
# Log a model with SHAP explainability
try:
import shap
with mlflow.start_run(experiment_id=experiment_id, run_name="Model with Explainability"):
model = LogisticRegression(max_iter=1000, random_state=42)
model.fit(X_train, y_train)
# Create SHAP explainer
explainer = shap.LinearExplainer(model, X_train)
shap_values = explainer.shap_values(X_test)
# Log model
mlflow.sklearn.log_model(model, "model")
# Log SHAP values as artifact
plt.figure()
shap.summary_plot(shap_values, X_test, feature_names=[f"feature_{i}" for i in range(X_test.shape[1])])
plt.savefig("shap_summary.png")
plt.close()
mlflow.log_artifact("shap_summary.png")
# Log feature importance
feature_importance = np.abs(model.coef_[0])
plt.figure(figsize=(10, 6))
plt.bar(range(len(feature_importance)), feature_importance)
plt.title("Feature Importance")
plt.xlabel("Feature Index")
plt.ylabel("Importance")
plt.savefig("feature_importance.png")
plt.close()
mlflow.log_artifact("feature_importance.png")
print("Model logged with explainability artifacts")
except ImportError:
print("SHAP not available, skipping explainability example")
Performance Optimization
MLflow Performance Techniques
| Technique | Description | Use Case |
|---|---|---|
| Efficient Tracking | Optimize experiment tracking for large-scale runs | High-volume experimentation |
| Artifact Storage | Use efficient storage backends for artifacts | Large models and datasets |
| Parallel Runs | Run multiple experiments in parallel | Hyperparameter tuning |
| Caching | Cache intermediate results and models | Repeated experiments |
| Streaming Metrics | Stream metrics during training | Long-running training jobs |
| Model Optimization | Optimize models for inference | Production deployment |
| Batch Inference | Process multiple inputs simultaneously | High-throughput applications |
| Hardware Acceleration | Leverage GPUs/TPUs for training | Deep learning models |
| Distributed Training | Scale training across multiple machines | Large-scale models |
| Model Compression | Reduce model size for deployment | Edge devices |
Performance Comparison Example
# Performance comparison example with MLflow
import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import time
import matplotlib.pyplot as plt
print("\nPerformance Comparison Example...")
# Set up MLflow
mlflow.set_tracking_uri("file:///tmp/mlruns")
# Create or get an experiment
experiment_name = "Performance Comparison"
try:
experiment_id = mlflow.create_experiment(experiment_name)
except:
experiment = mlflow.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id
print(f"Experiment ID: {experiment_id}")
# Prepare data
X, y = make_classification(
n_samples=5000,
n_features=20,
n_informative=15,
n_redundant=2,
n_classes=2,
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define models to compare
models = {
"Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
"Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
"Gradient Boosting": GradientBoostingClassifier(n_estimators=100, random_state=42),
"SVM": SVC(probability=True, random_state=42)
}
# 1. Training time comparison
print("\n1. Training Time Comparison...")
training_times = {}
inference_times = {}
accuracies = {}
f1_scores = {}
for model_name, model in models.items():
print(f"\nTraining {model_name}...")
with mlflow.start_run(experiment_id=experiment_id, run_name=f"{model_name} - Training Time"):
# Log model name
mlflow.log_param("model", model_name)
# Measure training time
start_time = time.time()
model.fit(X_train, y_train)
training_time = time.time() - start_time
# Measure inference time
start_time = time.time()
y_pred = model.predict(X_test)
inference_time = time.time() - start_time
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Log metrics
mlflow.log_metric("training_time", training_time)
mlflow.log_metric("inference_time", inference_time)
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
# Store results
training_times[model_name] = training_time
inference_times[model_name] = inference_time
accuracies[model_name] = accuracy
f1_scores[model_name] = f1
# Log model
mlflow.sklearn.log_model(model, "model")
print(f" Training time: {training_time:.4f} seconds")
print(f" Inference time: {inference_time:.4f} seconds")
print(f" Accuracy: {accuracy:.4f}")
print(f" F1 Score: {f1:.4f}")
# 2. Plot training time comparison
plt.figure(figsize=(12, 6))
plt.bar(training_times.keys(), training_times.values())
plt.title("Training Time Comparison")
plt.ylabel("Time (seconds)")
plt.xticks(rotation=45)
plt.savefig("training_time_comparison.png")
plt.close()
# 3. Plot inference time comparison
plt.figure(figsize=(12, 6))
plt.bar(inference_times.keys(), inference_times.values())
plt.title("Inference Time Comparison")
plt.ylabel("Time (seconds)")
plt.xticks(rotation=45)
plt.savefig("inference_time_comparison.png")
plt.close()
# 4. Plot accuracy comparison
plt.figure(figsize=(12, 6))
plt.bar(accuracies.keys(), accuracies.values())
plt.title("Accuracy Comparison")
plt.ylabel("Accuracy")
plt.ylim(0, 1)
plt.xticks(rotation=45)
plt.savefig("accuracy_comparison.png")
plt.close()
# 5. Plot F1 score comparison
plt.figure(figsize=(12, 6))
plt.bar(f1_scores.keys(), f1_scores.values())
plt.title("F1 Score Comparison")
plt.ylabel("F1 Score")
plt.ylim(0, 1)
plt.xticks(rotation=45)
plt.savefig("f1_score_comparison.png")
plt.close()
# Log comparison artifacts
with mlflow.start_run(experiment_id=experiment_id, run_name="Performance Comparison Summary"):
mlflow.log_artifact("training_time_comparison.png")
mlflow.log_artifact("inference_time_comparison.png")
mlflow.log_artifact("accuracy_comparison.png")
mlflow.log_artifact("f1_score_comparison.png")
# Log comparison metrics
for model_name in models.keys():
mlflow.log_metric(f"{model_name}_training_time", training_times[model_name])
mlflow.log_metric(f"{model_name}_inference_time", inference_times[model_name])
mlflow.log_metric(f"{model_name}_accuracy", accuracies[model_name])
mlflow.log_metric(f"{model_name}_f1_score", f1_scores[model_name])
# Find the best model by different criteria
best_accuracy_model = max(accuracies.items(), key=lambda x: x[1])
best_f1_model = max(f1_scores.items(), key=lambda x: x[1])
fastest_training_model = min(training_times.items(), key=lambda x: x[1])
fastest_inference_model = min(inference_times.items(), key=lambda x: x[1])
mlflow.log_metric("best_accuracy", best_accuracy_model[1])
mlflow.log_param("best_accuracy_model", best_accuracy_model[0])
mlflow.log_metric("best_f1_score", best_f1_model[1])
mlflow.log_param("best_f1_model", best_f1_model[0])
mlflow.log_metric("fastest_training_time", fastest_training_model[1])
mlflow.log_param("fastest_training_model", fastest_training_model[0])
mlflow.log_metric("fastest_inference_time", fastest_inference_model[1])
mlflow.log_param("fastest_inference_model", fastest_inference_model[0])
print("\nPerformance Comparison Summary:")
print(f"Best Accuracy: {best_accuracy_model[0]} ({best_accuracy_model[1]:.4f})")
print(f"Best F1 Score: {best_f1_model[0]} ({best_f1_model[1]:.4f})")
print(f"Fastest Training: {fastest_training_model[0]} ({fastest_training_model[1]:.4f}s)")
print(f"Fastest Inference: {fastest_inference_model[0]} ({fastest_inference_model[1]:.4f}s)")
# 6. Scalability testing
print("\n6. Scalability Testing...")
# Test with different dataset sizes
dataset_sizes = [1000, 5000, 10000, 20000, 50000]
scalability_results = {model_name: [] for model_name in models.keys()}
for size in dataset_sizes:
print(f"\nTesting with dataset size: {size}")
# Generate data
X, y = make_classification(
n_samples=size,
n_features=20,
n_informative=15,
n_redundant=2,
n_classes=2,
random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
for model_name, model in models.items():
print(f" Testing {model_name}...")
with mlflow.start_run(experiment_id=experiment_id, run_name=f"{model_name} - Scalability {size}"):
# Log parameters
mlflow.log_param("model", model_name)
mlflow.log_param("dataset_size", size)
# Measure training time
start_time = time.time()
model.fit(X_train, y_train)
training_time = time.time() - start_time
# Measure inference time
start_time = time.time()
y_pred = model.predict(X_test)
inference_time = time.time() - start_time
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
# Log metrics
mlflow.log_metric("training_time", training_time)
mlflow.log_metric("inference_time", inference_time)
mlflow.log_metric("accuracy", accuracy)
# Store results
scalability_results[model_name].append({
"dataset_size": size,
"training_time": training_time,
"inference_time": inference_time,
"accuracy": accuracy
})
print(f" Training time: {training_time:.4f}s")
print(f" Inference time: {inference_time:.4f}s")
print(f" Accuracy: {accuracy:.4f}")
# Plot scalability results
for model_name, results in scalability_results.items():
sizes = [r["dataset_size"] for r in results]
training_times = [r["training_time"] for r in results]
inference_times = [r["inference_time"] for r in results]
accuracies = [r["accuracy"] for r in results]
plt.figure(figsize=(15, 5))
plt.subplot(1, 3, 1)
plt.plot(sizes, training_times, marker='o')
plt.title(f"{model_name} - Training Time")
plt.xlabel("Dataset Size")
plt.ylabel("Time (seconds)")
plt.xscale('log')
plt.grid(True)
plt.subplot(1, 3, 2)
plt.plot(sizes, inference_times, marker='o')
plt.title(f"{model_name} - Inference Time")
plt.xlabel("Dataset Size")
plt.ylabel("Time (seconds)")
plt.xscale('log')
plt.grid(True)
plt.subplot(1, 3, 3)
plt.plot(sizes, accuracies, marker='o')
plt.title(f"{model_name} - Accuracy")
plt.xlabel("Dataset Size")
plt.ylabel("Accuracy")
plt.ylim(0, 1)
plt.xscale('log')
plt.grid(True)
plt.tight_layout()
plt.savefig(f"scalability_{model_name.replace(' ', '_')}.png")
plt.close()
# Log scalability results
with mlflow.start_run(experiment_id=experiment_id, run_name=f"{model_name} - Scalability Summary"):
mlflow.log_artifact(f"scalability_{model_name.replace(' ', '_')}.png")
mlflow.log_param("model", model_name)
for i, result in enumerate(results):
mlflow.log_metric(f"training_time_{sizes[i]}", result["training_time"])
mlflow.log_metric(f"inference_time_{sizes[i]}", result["inference_time"])
mlflow.log_metric(f"accuracy_{sizes[i]}", result["accuracy"])
# 7. Memory usage comparison
print("\n7. Memory Usage Comparison...")
# Function to estimate memory usage
def estimate_memory_usage(model, X_train):
"""Estimate memory usage of a model"""
import sys
# Get model size
model_size = sys.getsizeof(model)
# Get parameter size
param_size = 0
if hasattr(model, 'coef_'):
param_size += model.coef_.nbytes
if hasattr(model, 'intercept_'):
param_size += model.intercept_.nbytes
if hasattr(model, 'feature_importances_'):
param_size += model.feature_importances_.nbytes
if hasattr(model, 'n_features_in_'):
param_size += 8 # Approximate size for model attributes
# Estimate memory for training data
data_size = X_train.nbytes
# Total memory estimate
total_memory = model_size + param_size + data_size
return {
"model_size": model_size / (1024 * 1024), # MB
"param_size": param_size / (1024 * 1024), # MB
"data_size": data_size / (1024 * 1024), # MB
"total_memory": total_memory / (1024 * 1024) # MB
}
# Compare memory usage
print("Memory Usage Comparison (MB):")
print(f"{'Model':<20} {'Model Size':<12} {'Param Size':<12} {'Data Size':<12} {'Total':<10}")
print("-" * 65)
for model_name, model in models.items():
memory = estimate_memory_usage(model, X_train)
print(f"{model_name:<20} {memory['model_size']:.2f} {memory['param_size']:.2f} {memory['data_size']:.2f} {memory['total_memory']:.2f}")
# Log memory usage
with mlflow.start_run(experiment_id=experiment_id, run_name=f"{model_name} - Memory Usage"):
mlflow.log_param("model", model_name)
mlflow.log_metric("model_size_mb", memory['model_size'])
mlflow.log_metric("param_size_mb", memory['param_size'])
mlflow.log_metric("data_size_mb", memory['data_size'])
mlflow.log_metric("total_memory_mb", memory['total_memory'])
# 8. Batch size optimization
print("\n8. Batch Size Optimization...")
# Test different batch sizes for inference
batch_sizes = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
batch_results = {model_name: [] for model_name in models.keys()}
for batch_size in batch_sizes:
print(f"\nTesting batch size: {batch_size}")
# Create batches
n_batches = len(X_test) // batch_size
if len(X_test) % batch_size != 0:
n_batches += 1
for model_name, model in models.items():
print(f" Testing {model_name}...")
with mlflow.start_run(experiment_id=experiment_id, run_name=f"{model_name} - Batch {batch_size}"):
# Log parameters
mlflow.log_param("model", model_name)
mlflow.log_param("batch_size", batch_size)
# Measure inference time
start_time = time.time()
for i in range(n_batches):
start_idx = i * batch_size
end_idx = min((i + 1) * batch_size, len(X_test))
batch = X_test[start_idx:end_idx]
model.predict(batch)
inference_time = time.time() - start_time
# Calculate throughput
throughput = len(X_test) / inference_time
# Log metrics
mlflow.log_metric("inference_time", inference_time)
mlflow.log_metric("throughput", throughput)
# Store results
batch_results[model_name].append({
"batch_size": batch_size,
"inference_time": inference_time,
"throughput": throughput
})
print(f" Inference time: {inference_time:.4f}s")
print(f" Throughput: {throughput:.2f} samples/s")
# Plot batch size optimization results
for model_name, results in batch_results.items():
batch_sizes = [r["batch_size"] for r in results]
inference_times = [r["inference_time"] for r in results]
throughputs = [r["throughput"] for r in results]
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(batch_sizes, inference_times, marker='o')
plt.title(f"{model_name} - Inference Time vs Batch Size")
plt.xlabel("Batch Size")
plt.ylabel("Time (seconds)")
plt.xscale('log', base=2)
plt.grid(True)
plt.subplot(1, 2, 2)
plt.plot(batch_sizes, throughputs, marker='o')
plt.title(f"{model_name} - Throughput vs Batch Size")
plt.xlabel("Batch Size")
plt.ylabel("Throughput (samples/second)")
plt.xscale('log', base=2)
plt.grid(True)
plt.tight_layout()
plt.savefig(f"batch_optimization_{model_name.replace(' ', '_')}.png")
plt.close()
# Log batch optimization results
with mlflow.start_run(experiment_id=experiment_id, run_name=f"{model_name} - Batch Optimization Summary"):
mlflow.log_artifact(f"batch_optimization_{model_name.replace(' ', '_')}.png")
mlflow.log_param("model", model_name)
for i, result in enumerate(results):
mlflow.log_metric(f"inference_time_{batch_sizes[i]}", result["inference_time"])
mlflow.log_metric(f"throughput_{batch_sizes[i]}", result["throughput"])
# Find optimal batch size
optimal_idx = np.argmax(throughputs)
optimal_batch_size = batch_sizes[optimal_idx]
optimal_throughput = throughputs[optimal_idx]
mlflow.log_metric("optimal_throughput", optimal_throughput)
mlflow.log_param("optimal_batch_size", optimal_batch_size)
print(f"\n{model_name} Optimal Batch Size: {optimal_batch_size}")
print(f"{model_name} Optimal Throughput: {optimal_throughput:.2f} samples/s")
Challenges
Conceptual Challenges
- Experiment Management: Organizing and tracking large numbers of experiments
- Reproducibility: Ensuring consistent results across different environments
- Model Versioning: Managing multiple versions of models
- Collaboration: Enabling team collaboration on ML projects
- Model Deployment: Transitioning models from development to production
- Monitoring: Tracking model performance in production
- Data Management: Handling large datasets and data versions
- Scalability: Managing ML workflows at scale
Practical Challenges
- Tracking Overhead: Performance impact of experiment tracking
- Storage Management: Handling large volumes of artifacts
- Integration: Integrating with existing ML workflows
- Security: Securing sensitive data and models
- Access Control: Managing permissions for team members
- Model Serving: Efficiently serving models in production
- Dependency Management: Handling complex dependency requirements
- Environment Consistency: Ensuring consistent environments across stages
Technical Challenges
- Distributed Training: Scaling training across multiple machines
- Model Optimization: Optimizing models for production
- Hardware Acceleration: Leveraging GPUs/TPUs effectively
- Memory Management: Handling large models and datasets
- Real-time Inference: Low-latency predictions for real-time applications
- Model Explainability: Providing interpretable model predictions
- Model Monitoring: Detecting model drift and performance degradation
- CI/CD for ML: Implementing continuous integration for ML models
Research and Advancements
Key Developments
- "MLflow: An Open Source Platform for the Machine Learning Lifecycle" (Zaharia et al., 2018)
- Introduced MLflow platform
- Presented core components (Tracking, Projects, Models)
- Demonstrated end-to-end ML lifecycle management
- "Accelerating the Machine Learning Lifecycle with MLflow" (2019)
- Presented MLflow Model Registry
- Demonstrated collaboration features
- Showed integration with production systems
- "Reproducible Machine Learning with MLflow" (2020)
- Presented reproducibility features
- Demonstrated environment management
- Showed experiment tracking capabilities
- "MLflow: A Platform for Managing the Machine Learning Lifecycle at Scale" (2021)
- Presented scalability improvements
- Demonstrated distributed training support
- Showed enterprise features
- "MLflow and MLOps: Best Practices for Production Machine Learning" (2022)
- Presented MLOps integration
- Demonstrated CI/CD for ML
- Showed model monitoring and management
Emerging Research Directions
- Automated ML Lifecycle: AI-driven ML lifecycle management
- Federated Learning: Privacy-preserving distributed learning
- Explainable ML: Interpretable machine learning workflows
- Green ML: Energy-efficient machine learning
- Edge ML: ML deployment on edge devices
- Neurosymbolic ML: Combining neural networks with symbolic reasoning
- Quantum ML: Quantum computing for machine learning
- Automated Experimentation: AI-driven experiment design
- Model Governance: Comprehensive model management and compliance
- ML Security: Secure machine learning workflows
Best Practices
Experiment Tracking
- Organize Experiments: Use meaningful experiment names and tags
- Log Comprehensive Metrics: Track all relevant metrics and parameters
- Use Tags: Tag runs for easy filtering and organization
- Log Artifacts: Store relevant files and visualizations
- Document Runs: Add descriptions to runs and experiments
- Use Nested Runs: Organize related runs hierarchically
- Track Data: Log dataset versions and characteristics
- Monitor Training: Stream metrics during training
- Compare Runs: Use MLflow's comparison tools
- Share Results: Collaborate with team members
Model Development
- Start Simple: Begin with baseline models
- Iterate Quickly: Use MLflow to track rapid iterations
- Validate Thoroughly: Test models on multiple datasets
- Monitor Performance: Track key metrics over time
- Document Models: Add comprehensive model documentation
- Version Control: Use model versioning consistently
- Test Edge Cases: Evaluate models on edge cases
- Validate Fairness: Check for bias and fairness
- Optimize Gradually: Improve models incrementally
- Collaborate: Share models and results with team members
Model Deployment
- Start Small: Begin with limited deployments
- Monitor Closely: Track model performance in production
- Implement Rollback: Have rollback mechanisms in place
- Use Staging: Test models in staging before production
- Monitor Drift: Track data and concept drift
- Implement A/B Testing: Compare new models with production
- Optimize Performance: Tune models for production performance
- Secure Models: Implement proper security measures
- Document Deployment: Maintain deployment documentation
- Plan for Updates: Implement model update strategies
MLOps
- Automate Workflows: Implement CI/CD for ML
- Monitor Continuously: Track models in production
- Implement Governance: Establish model governance policies
- Ensure Reproducibility: Maintain reproducible workflows
- Manage Data: Implement data versioning and lineage
- Collaborate: Enable team collaboration
- Document Processes: Maintain comprehensive documentation
- Implement Testing: Test models thoroughly
- Plan for Failure: Implement fallback mechanisms
- Optimize Costs: Monitor and optimize resource usage
External Resources
- MLflow Official Website
- MLflow GitHub Repository
- MLflow Documentation
- MLflow Tutorials
- MLflow Quickstart
- MLflow Tracking API
- MLflow Projects
- MLflow Models
- MLflow Model Registry
- MLflow UI Guide
- MLflow REST API
- MLflow Python API
- MLflow R API
- MLflow Java API
- MLflow CLI
- MLflow Plugins
- MLflow Authentication
- MLflow Deployment
- MLflow Docker
- MLflow SageMaker
- MLflow Azure ML
- MLflow Kubernetes
- MLflow Databricks
- MLflow Community
- MLflow Issue Tracker
- MLflow Release Notes
- MLflow Examples
- MLflow Best Practices
- MLflow Security
- MLflow Scaling