sklearn-genetic-opt
sklearn-genetic-opt adds evolutionary optimization tools to the
scikit-learn workflow. It can tune hyperparameters with
GASearchCV and select feature subsets with
GAFeatureSelectionCV using algorithms powered by
DEAP.
The project is useful when a search space is mixed, irregular, expensive, or
not well served by an exhaustive grid. It follows familiar scikit-learn
patterns: define an estimator, define a search space, call fit, inspect
best_params_ or support_, and use the fitted object for prediction.
Highlights
GASearchCVfor hyperparameter search across classification, regression, and supported outlier-detection estimators.GAFeatureSelectionCVfor wrapper-based feature selection with cross-validation.Search spaces for integer, continuous, and categorical parameters.
Grouped configuration objects for readable advanced setups:
EvolutionConfig,PopulationConfig,RuntimeConfig, andOptimizationConfig.Smart initial populations with
PopulationConfig(initializer="smart"), including warm-start seeds, estimator defaults, Latin-hypercube numeric coverage, stratified categorical coverage, and duplicate avoidance.Adaptive mutation and crossover schedules.
Optional local search, diversity control, random immigrants, and fitness sharing to improve exploration, avoid premature convergence, and refine good solutions.
Parallel candidate evaluation with
n_jobsandparallel_backend.Evaluation caching, optimizer telemetry through
history, and fit-cost counters throughfit_stats_.Callbacks for early stopping, progress reporting, checkpoints, TensorBoard, and custom logic.
Plotting helpers plus MLflow 3 logging support.
Installation
Install the core package with pip:
pip install sklearn-genetic-opt
Or with conda from the conda-forge channel:
conda install -c conda-forge sklearn-genetic-opt
Install optional plotting, MLflow, and TensorBoard integrations with pip:
pip install sklearn-genetic-opt[all]
The conda package ships only the core dependencies. To use the optional integrations in a conda environment, install the extras you need alongside it, for example:
conda install -c conda-forge sklearn-genetic-opt seaborn mlflow
Requirements
Core requirements:
Python (>= 3.12)
scikit-learn (>= 1.9.0)
NumPy (>= 2.4.6)
DEAP (>= 1.4.4)
tqdm (>= 4.68.3)
Optional extras:
Seaborn (>= 0.13.2) for plots
MLflow (>= 3.14.0) for experiment logging
TensorFlow (>= 2.21.0) and TensorBoard (>= 2.20.0, < 2.21.0) for TensorBoard logging on Python < 3.14
Quick Start
This example tunes a RandomForestClassifier across six hyperparameters on
the breast cancer dataset. With six mixed parameters — integers, floats, and
a categorical — this is exactly the kind of search where GA’s ability to
recombine good partial solutions gives it an edge over independent random
sampling.
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.metrics import roc_auc_score
from sklearn_genetic import EvolutionConfig, GASearchCV, PopulationConfig, RuntimeConfig
from sklearn_genetic.space import Categorical, Continuous, Integer
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, stratify=y, random_state=42
)
param_grid = {
"n_estimators": Integer(50, 250),
"max_depth": Integer(2, 14),
"min_samples_split": Integer(2, 12),
"min_samples_leaf": Integer(1, 8),
"max_features": Categorical(["sqrt", "log2", None]),
"ccp_alpha": Continuous(0.0, 0.03),
}
search = GASearchCV(
estimator=RandomForestClassifier(random_state=42),
param_grid=param_grid,
cv=StratifiedKFold(n_splits=3, shuffle=True, random_state=42),
scoring="roc_auc",
evolution_config=EvolutionConfig(population_size=20, generations=12),
population_config=PopulationConfig(initializer="smart"),
runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True),
)
search.fit(X_train, y_train)
print(search.best_params_)
print("CV score:", round(search.best_score_, 4))
y_prob = search.predict_proba(X_test)[:, 1]
print("Holdout ROC-AUC:", round(roc_auc_score(y_test, y_prob), 4))
# Evaluation cost breakdown
print(search.fit_stats_)
Recommended Next Steps
Not sure if GA search is the right tool? Start with When to Use sklearn-genetic-opt for a comparison guide and decision table.
New to the library? How to Use sklearn-genetic-opt walks through the full workflow from data loading to prediction.
Tuning a scikit-learn
Pipeline? See Pipeline Tuning with GASearchCV for thestep__paramnaming convention and a worked regression example.Read Advanced Optimizer Control for local search, diversity control, fitness sharing, and optimizer telemetry when the default settings are not enough.
Something not working? See Troubleshooting for common errors, slow-search diagnosis, and how to read
fit_stats_.
User Guide / Tutorials:
- When to Use sklearn-genetic-opt
- How to Use sklearn-genetic-opt
- Understanding the Evaluation Process
- Pipeline Tuning with GASearchCV
- Multi-Metric Hyperparameter Search
- Using Callbacks
- Custom Callbacks
- Using Adapters
- Advanced Optimizer Control
- Integrating with MLflow
- Outlier Detection Support
- Reproducibility
- Troubleshooting
Jupyter notebooks examples:
Release Notes
- Release Notes
- What’s new in 0.13.0
- What’s new in 0.12.0
- What’s new in 0.11.1
- What’s new in 0.11.0
- What’s new in 0.10.1
- What’s new in 0.10.0
- What’s new in 0.9.0
- What’s new in 0.8.1
- What’s new in 0.8.0
- What’s new in 0.7.0
- What’s new in 0.6.1
- What’s new in 0.6.0
- What’s new in 0.5.0
- What’s new in 0.4.1
- What’s new in 0.4
- What’s new in 0.3
- What’s new in 0.2
- What’s new in 0.1
API Reference:
External References: