Random Search
What is Random Search?
Random Search is a hyperparameter optimization technique that samples parameter combinations randomly from predefined distributions, rather than exhaustively evaluating all possible combinations like grid search. As an efficient alternative to exhaustive methods, random search provides a practical approach to hyperparameter tuning that often finds good parameter configurations with significantly fewer evaluations.
Key Characteristics
- Random Sampling: Selects parameter combinations randomly
- Probabilistic: Results may vary between runs
- Efficient: Finds good parameters with fewer evaluations
- Flexible: Can sample from continuous and discrete distributions
- Scalable: Works well with high-dimensional parameter spaces
- Parallelizable: Can leverage distributed computing
How Random Search Works
- Define Parameter Distributions: Specify distributions for each hyperparameter
- Random Sampling: Generate random parameter combinations
- Model Training: Train model with each sampled combination
- Performance Evaluation: Assess performance using cross-validation
- Optimal Selection: Choose combination with best performance
- Final Training: Train final model with optimal parameters
Random Search Process Diagram
Parameter Distributions
│
├── Param1: Uniform(0.001, 0.1)
├── Param2: Int(1, 100)
└── Param3: Categorical(['rbf', 'linear'])
│
▼
Random Sampling (n_iter=10)
│
├── (0.023, 42, 'rbf')
├── (0.005, 17, 'linear')
├── (0.087, 89, 'rbf')
├── (0.012, 5, 'linear')
├── (0.045, 73, 'rbf')
├── (0.008, 24, 'linear')
├── (0.062, 58, 'rbf')
├── (0.031, 91, 'linear')
├── (0.074, 12, 'rbf')
└── (0.019, 67, 'linear')
│
▼
Model Training & Evaluation
│
▼
Optimal Parameter Selection
Mathematical Foundations
Probability of Finding Good Parameters
For a parameter space with $d$ dimensions where each dimension has probability $p$ of containing a good value:
$$ P(\text{find good value}) \approx 1 - (1 - p)^n $$
where $n$ is the number of random samples.
Efficiency Comparison
Random search finds good parameters with high probability in fewer evaluations than grid search:
$$ n_{\text{random}} \ll \prod_^d n_i $$
where $n_i$ is the number of values per dimension in grid search.
Expected Performance
The expected performance of random search:
$$ \mathbb{E}\theta = \int \theta(\lambda) p(\lambda) d\lambda $$
where $\theta(\lambda)$ is the performance for parameters $\lambda$ and $p(\lambda)$ is the sampling distribution.
Random Search vs Grid Search
| Aspect | Random Search | Grid Search |
|---|---|---|
| Search Strategy | Random sampling | Exhaustive evaluation |
| Computational Cost | Lower | Higher |
| Parameter Space | Can sample from continuous dist. | Limited to discrete values |
| Efficiency | Finds good values faster | Requires more evaluations |
| Reproducibility | Varies between runs | Deterministic |
| Implementation | Simple | Simple |
| Best For | High-dimensional spaces | Low-dimensional spaces |
| Flexibility | High (any distribution) | Low (fixed values) |
Random Search Implementation
Python Example with Scikit-Learn
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from scipy.stats import randint, uniform
# Create synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
# Define model
model = RandomForestClassifier(random_state=42)
# Define parameter distributions
param_dist = {
'n_estimators': randint(50, 200),
'max_depth': [None] + list(randint(5, 50).rvs(10)),
'min_samples_split': randint(2, 20),
'min_samples_leaf': randint(1, 10),
'max_features': ['sqrt', 'log2', None],
'bootstrap': [True, False],
'criterion': ['gini', 'entropy']
}
# Create random search
random_search = RandomizedSearchCV(
estimator=model,
param_distributions=param_dist,
n_iter=50, # Number of parameter settings sampled
cv=5, # 5-fold cross-validation
scoring='accuracy',
n_jobs=-1, # Use all available cores
random_state=42,
verbose=1
)
# Execute random search
random_search.fit(X, y)
# Results
print(f"Best parameters: {random_search.best_params_}")
print(f"Best cross-validation score: {random_search.best_score_:.4f}")
print(f"Best estimator: {random_search.best_estimator_}")
Random Search with Continuous Parameters
from scipy.stats import loguniform
param_dist = {
'learning_rate': loguniform(1e-4, 1e-1),
'n_estimators': randint(50, 500),
'max_depth': randint(3, 30),
'min_samples_split': randint(2, 20),
'min_samples_leaf': randint(1, 10),
'subsample': uniform(0.6, 0.4) # Uniform between 0.6 and 1.0
}
random_search = RandomizedSearchCV(
GradientBoostingClassifier(random_state=42),
param_distributions=param_dist,
n_iter=100,
cv=5,
scoring='roc_auc',
n_jobs=-1,
random_state=42
)
random_search.fit(X, y)
Advantages of Random Search
- Efficiency: Finds good parameters with fewer evaluations
- Scalability: Works well with high-dimensional parameter spaces
- Flexibility: Can sample from any probability distribution
- Continuous Parameters: Handles continuous parameter ranges naturally
- Parallelization: Easily parallelizable across multiple cores
- Resource Efficiency: Better use of computational resources
- Practical Performance: Often finds better parameters than grid search
Challenges in Random Search
- Non-Deterministic: Results vary between runs
- Suboptimal Coverage: May miss optimal regions
- Parameter Dependencies: Doesn't account for parameter interactions
- Sampling Bias: Distribution choice affects results
- Evaluation Variance: Performance estimates can be noisy
- Early Stopping: Needs careful implementation
- Reproducibility: Requires fixed random seeds
Best Practices
- Use Appropriate Distributions: Choose distributions that match parameter characteristics
- Logarithmic Scales: For parameters with wide ranges (e.g., learning rates)
- Sufficient Iterations: Set n_iter based on computational budget
- Fixed Random Seeds: Ensure reproducibility
- Combine with Grid Search: Use random search for exploration, grid for refinement
- Monitor Progress: Track performance across iterations
- Visualize Results: Plot parameter distributions and performance
- Consider Parameter Importance: Focus on most impactful parameters
Random Search Strategies
Adaptive Random Search
- Initial Sampling: Sample parameters from broad distributions
- Performance Analysis: Identify promising parameter regions
- Distribution Update: Narrow distributions around promising regions
- Iterative Refinement: Repeat with updated distributions
Successive Halving with Random Search
- Initial Population: Sample many parameter combinations
- Resource Allocation: Train all models with limited resources
- Elimination: Discard poor-performing combinations
- Resource Increase: Allocate more resources to remaining combinations
- Iteration: Repeat until best combination found
Hybrid Random-Grid Search
- Initial Exploration: Use random search to identify promising regions
- Grid Refinement: Apply grid search in promising areas
- Final Tuning: Use finer random search for final optimization
Random Search in Practice
Parameter Distribution Design
- Continuous Parameters: Use uniform, loguniform, or normal distributions
- Discrete Parameters: Use randint for integer ranges
- Categorical Parameters: Use choice for discrete options
- Logarithmic Scales: For parameters like learning rates, regularization
- Bounded Ranges: Set reasonable upper and lower limits
Evaluation and Selection
- Cross-Validation: Use k-fold cross-validation for reliable estimates
- Multiple Metrics: Consider primary and secondary evaluation metrics
- Statistical Testing: Compare performance across parameter sets
- Model Ensembles: Consider combining top-performing models
Resource Management
- Computational Budget: Set limits on time or iterations
- Distributed Computing: Use parallel processing frameworks
- Early Stopping: Implement early stopping for training
- Checkpointing: Save intermediate results for recovery
Random Search for Different Algorithms
Neural Networks
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform
def create_model(learning_rate=0.01, init='glorot_uniform'):
model = Sequential()
model.add(Dense(12, input_dim=20, activation='relu', kernel_initializer=init))
model.add(Dense(8, activation='relu', kernel_initializer=init))
model.add(Dense(1, activation='sigmoid'))
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
model = KerasClassifier(build_fn=create_model, verbose=0)
param_dist = {
'learning_rate': loguniform(1e-4, 1e-1),
'init': ['glorot_uniform', 'he_normal', 'lecun_normal'],
'batch_size': [16, 32, 64, 128],
'epochs': [50, 100, 200]
}
random_search = RandomizedSearchCV(
estimator=model,
param_distributions=param_dist,
n_iter=20,
cv=3,
scoring='accuracy',
n_jobs=1,
random_state=42
)
random_search.fit(X, y)
Support Vector Machines
from sklearn.svm import SVC
from scipy.stats import loguniform
param_dist = {
'C': loguniform(1e-2, 1e2),
'gamma': loguniform(1e-4, 1e1),
'kernel': ['rbf', 'linear', 'poly', 'sigmoid'],
'degree': randint(2, 5), # For polynomial kernel
'class_weight': [None, 'balanced']
}
random_search = RandomizedSearchCV(
SVC(random_state=42),
param_distributions=param_dist,
n_iter=50,
cv=5,
scoring='f1_macro',
n_jobs=-1,
random_state=42
)
random_search.fit(X, y)
Gradient Boosting Machines
from sklearn.ensemble import GradientBoostingClassifier
param_dist = {
'n_estimators': randint(50, 500),
'learning_rate': loguniform(1e-3, 1e-1),
'max_depth': randint(3, 10),
'min_samples_split': randint(2, 20),
'min_samples_leaf': randint(1, 10),
'max_features': ['sqrt', 'log2', None],
'subsample': uniform(0.6, 0.4),
'loss': ['log_loss', 'exponential']
}
random_search = RandomizedSearchCV(
GradientBoostingClassifier(random_state=42),
param_distributions=param_dist,
n_iter=100,
cv=5,
scoring='roc_auc',
n_jobs=-1,
random_state=42
)
random_search.fit(X, y)
Future Directions
- Adaptive Random Search: Dynamic distribution adjustment
- Neural Random Search: Deep learning for parameter sampling
- Bayesian Random Search: Combining Bayesian optimization with random sampling
- Automated Distribution Selection: AI-driven distribution design
- Distributed Random Search: Large-scale distributed implementations
- Explainable Random Search: Interpretable parameter selection
- Multi-Objective Random Search: Balancing multiple performance metrics