sklearn-genetic-opt

sklearn-genetic-opt adds evolutionary optimization tools to the scikit-learn workflow. It can tune hyperparameters with GASearchCV and select feature subsets with GAFeatureSelectionCV using algorithms powered by DEAP.

The project is useful when a search space is mixed, irregular, expensive, or not well served by an exhaustive grid. It follows familiar scikit-learn patterns: define an estimator, define a search space, call fit, inspect best_params_ or support_, and use the fitted object for prediction.

Highlights

GASearchCV for hyperparameter search across classification, regression, and supported outlier-detection estimators.
GAFeatureSelectionCV for wrapper-based feature selection with cross-validation.
Search spaces for integer, continuous, and categorical parameters.
Grouped configuration objects for readable advanced setups: EvolutionConfig, PopulationConfig, RuntimeConfig, and OptimizationConfig.
Smart initial populations with PopulationConfig(initializer="smart"), including warm-start seeds, estimator defaults, Latin-hypercube numeric coverage, stratified categorical coverage, and duplicate avoidance.
Adaptive mutation and crossover schedules.
Optional local search, diversity control, random immigrants, and fitness sharing to improve exploration, avoid premature convergence, and refine good solutions.
Parallel candidate evaluation with n_jobs and parallel_backend.
Evaluation caching, optimizer telemetry through history, and fit-cost counters through fit_stats_.
Callbacks for early stopping, progress reporting, checkpoints, TensorBoard, and custom logic.
Plotting helpers plus MLflow 3 logging support.

Installation

Install the core package with pip:

pip install sklearn-genetic-opt

Or with conda from the conda-forge channel:

conda install -c conda-forge sklearn-genetic-opt

Install optional plotting, MLflow, and TensorBoard integrations with pip:

pip install sklearn-genetic-opt[all]

The conda package ships only the core dependencies. To use the optional integrations in a conda environment, install the extras you need alongside it, for example:

conda install -c conda-forge sklearn-genetic-opt seaborn mlflow

Requirements

Core requirements:

Python (>= 3.12)
scikit-learn (>= 1.9.0)
NumPy (>= 2.4.6)
DEAP (>= 1.4.4)
tqdm (>= 4.68.3)

Optional extras:

Seaborn (>= 0.13.2) for plots
MLflow (>= 3.14.0) for experiment logging
TensorFlow (>= 2.21.0) and TensorBoard (>= 2.20.0, < 2.21.0) for TensorBoard logging on Python < 3.14

Quick Start

This example tunes a RandomForestClassifier across six hyperparameters on the breast cancer dataset. With six mixed parameters — integers, floats, and a categorical — this is exactly the kind of search where GA’s ability to recombine good partial solutions gives it an edge over independent random sampling.

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.metrics import roc_auc_score

from sklearn_genetic import EvolutionConfig, GASearchCV, PopulationConfig, RuntimeConfig
from sklearn_genetic.space import Categorical, Continuous, Integer

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

param_grid = {
    "n_estimators": Integer(50, 250),
    "max_depth": Integer(2, 14),
    "min_samples_split": Integer(2, 12),
    "min_samples_leaf": Integer(1, 8),
    "max_features": Categorical(["sqrt", "log2", None]),
    "ccp_alpha": Continuous(0.0, 0.03),
}

search = GASearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=StratifiedKFold(n_splits=3, shuffle=True, random_state=42),
    scoring="roc_auc",
    evolution_config=EvolutionConfig(population_size=20, generations=12),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True),
)

search.fit(X_train, y_train)

print(search.best_params_)
print("CV score:", round(search.best_score_, 4))

y_prob = search.predict_proba(X_test)[:, 1]
print("Holdout ROC-AUC:", round(roc_auc_score(y_test, y_prob), 4))

# Evaluation cost breakdown
print(search.fit_stats_)

Recommended Next Steps

Not sure if GA search is the right tool? Start with When to Use sklearn-genetic-opt for a comparison guide and decision table.
New to the library? How to Use sklearn-genetic-opt walks through the full workflow from data loading to prediction.
Tuning a scikit-learn Pipeline? See Pipeline Tuning with GASearchCV for the step__param naming convention and a worked regression example.
Read Advanced Optimizer Control for local search, diversity control, fitness sharing, and optimizer telemetry when the default settings are not enough.
Something not working? See Troubleshooting for common errors, slow-search diagnosis, and how to read fit_stats_.

User Guide / Tutorials:

Jupyter notebooks examples:

Release Notes

Release Notes

External References: