Comparing GASearchCV With sklearn Search Methods

This notebook compares GASearchCV with RandomizedSearchCV and GridSearchCV on the same classification problem. The goal is not to declare one method universally best; it is to show how to compare solution quality, search cost, and runtime fairly.

Problem Setup

We use the breast cancer binary classification dataset and a scaled logistic-regression pipeline. The search space includes continuous and categorical choices, which makes it a good small example for comparing search methods.

[1]:
import time
import warnings

import numpy as np
import pandas as pd
from scipy.stats import loguniform
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, roc_auc_score
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, StratifiedKFold, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklearn_genetic import (
    EvolutionConfig,
    GASearchCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Continuous

warnings.filterwarnings("ignore", category=UserWarning)

RANDOM_STATE = 42
[2]:
data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.30,
    stratify=y,
    random_state=RANDOM_STATE,
)
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)

Shared Model and Metrics

Each method receives the same estimator family and the same train/test split. We report both cross-validation score and holdout metrics.

[3]:
def make_model():
    return Pipeline(
        [
            ("scaler", StandardScaler()),
            (
                "logistic",
                LogisticRegression(
                    solver="liblinear",
                    max_iter=500,
                    random_state=RANDOM_STATE,
                ),
            ),
        ]
    )


def evaluate_classifier(estimator):
    predictions = estimator.predict(X_test)
    probabilities = estimator.predict_proba(X_test)[:, 1]
    return {
        "accuracy": accuracy_score(y_test, predictions),
        "balanced_accuracy": balanced_accuracy_score(y_test, predictions),
        "f1": f1_score(y_test, predictions),
        "roc_auc": roc_auc_score(y_test, probabilities),
    }


def summarize_search(name, estimator, fit_seconds):
    cv_results = getattr(estimator, "cv_results_", {})
    evaluated_candidates = len(cv_results.get("params", []))
    row = {
        "method": name,
        "fit_seconds": fit_seconds,
        "evaluated_candidates": evaluated_candidates,
        "estimated_cv_evaluations": evaluated_candidates * cv.get_n_splits(),
        "best_cv_score": getattr(estimator, "best_score_", None),
    }
    row.update(evaluate_classifier(estimator))
    return row

Run RandomizedSearchCV

Random search samples a fixed number of candidates. It is often a strong baseline for continuous spaces.

[4]:
randomized_search = RandomizedSearchCV(
    estimator=make_model(),
    param_distributions={
        "logistic__C": loguniform(1e-3, 30.0),
        "logistic__class_weight": [None, "balanced"],
    },
    n_iter=16,
    scoring="roc_auc",
    cv=cv,
    n_jobs=-1,
    random_state=RANDOM_STATE,
    refit=True,
)

started_at = time.perf_counter()
randomized_search.fit(X_train, y_train)
randomized_seconds = time.perf_counter() - started_at

Run GridSearchCV

Grid search is deterministic and easy to reason about. It becomes expensive when every additional dimension multiplies the candidate count.

[5]:
grid_search = GridSearchCV(
    estimator=make_model(),
    param_grid={
        "logistic__C": np.geomspace(1e-3, 30.0, num=8),
        "logistic__class_weight": [None, "balanced"],
    },
    scoring="roc_auc",
    cv=cv,
    n_jobs=-1,
    refit=True,
)

started_at = time.perf_counter()
grid_search.fit(X_train, y_train)
grid_seconds = time.perf_counter() - started_at

Run GASearchCV

The GA version uses the same parameter region with sklearn-genetic-opt spaces and enables optimizer controls that are useful in mixed search spaces.

[6]:
ga_search = GASearchCV(
    estimator=make_model(),
    param_grid={
        "logistic__C": Continuous(1e-3, 30.0, distribution="log-uniform"),
        "logistic__class_weight": Categorical([None, "balanced"]),
    },
    scoring="roc_auc",
    cv=cv,
    evolution_config=EvolutionConfig(
        population_size=10,
        generations=8,
        crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),
        mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),
        tournament_size=3,
        elitism=True,
        keep_top_k=3,
    ),
    population_config=PopulationConfig(
        initializer="smart",
        warm_start_configs=[{"logistic__C": 1.0, "logistic__class_weight": None}],
    ),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True, verbose=True),
    optimization_config=OptimizationConfig(
        local_search=True,
        local_search_top_k=2,
        local_search_steps=1,
        diversity_control=True,
        random_immigrants_fraction=0.10,
        fitness_sharing=True,
    ),
)

callbacks = [
    DeltaThreshold(threshold=0.0005, generations=5, metric="fitness_best"),
    ConsecutiveStopping(generations=7, metric="fitness_best"),
    TimerStopping(total_seconds=90),
]

started_at = time.perf_counter()
ga_search.fit(X_train, y_train, callbacks=callbacks)
ga_seconds = time.perf_counter() - started_at

 gen evals           avg          best     div  unique  stag     mut   sel             events
---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------
   0    10       0.99336       0.99452   0.556   1.000     0       -     - -
   1    20       0.99354       0.99452   0.389   0.800     1   0.200     3 dup=12,share
   2    20       0.99375       0.99452   0.389   0.700     2   0.216     3 dup=15,share
   3    20       0.99331       0.99452   0.500   0.900     3   0.193     3 dup=14,share
   4    20       0.99337       0.99452   0.389   0.800     4   0.177     3 dup=14,share
INFO: DeltaThreshold callback met its criteria
INFO: Stopping the algorithm

Compare Results

Candidate budgets are not exactly identical, so the table includes evaluated candidates and estimated CV evaluations. Use this context when comparing runtime.

[7]:
comparison = pd.DataFrame(
    [
        summarize_search("RandomizedSearchCV", randomized_search, randomized_seconds),
        summarize_search("GridSearchCV", grid_search, grid_seconds),
        summarize_search("GASearchCV", ga_search, ga_seconds),
    ]
).sort_values("roc_auc", ascending=False)

comparison
[7]:
method fit_seconds evaluated_candidates estimated_cv_evaluations best_cv_score accuracy balanced_accuracy f1 roc_auc
0 RandomizedSearchCV 21.342722 16 48 0.994902 0.982456 0.979702 0.986047 0.996641
1 GridSearchCV 0.335785 16 48 0.994745 0.982456 0.979702 0.986047 0.996641
2 GASearchCV 11.507412 92 276 0.994904 0.982456 0.979702 0.986047 0.996641

Read GA-Specific Telemetry

The sklearn searchers expose cv_results_. GASearchCV also exposes fit_stats_ and history, which help explain search behavior.

[8]:
ga_search.fit_stats_
[8]:
{'evaluated_candidates': 92,
 'unique_candidates': 87,
 'cross_validate_calls': 87,
 'cache_hits': 5,
 'duplicate_candidates': 0,
 'skipped_invalid_candidates': 0,
 'population_parallel_batches': 6,
 'population_serial_batches': 0,
 'random_immigrants': 0,
 'local_refinement_candidates': 2}
[9]:
history = pd.DataFrame(ga_search.history)
history[[
    "gen",
    "fitness",
    "fitness_max",
    "unique_individual_ratio",
    "genotype_diversity",
    "stagnation_generations",
]].tail()
[9]:
gen fitness fitness_max unique_individual_ratio genotype_diversity stagnation_generations
0 0 0.993359 0.994519 1.0 0.555556 0
1 1 0.993536 0.994502 0.8 0.388889 1
2 2 0.993755 0.993880 0.7 0.388889 2
3 3 0.993312 0.994198 0.9 0.500000 3
4 4 0.993955 0.994656 0.9 0.444444 0

Practical Notes

  • Compare methods using both quality metrics and search cost.

  • RandomizedSearchCV is a strong baseline for continuous spaces.

  • GridSearchCV is useful when the grid is small and deliberately chosen.

  • GASearchCV becomes more attractive as the space gets mixed, conditional, rugged, or expensive enough that smarter exploration matters.

  • For repeatable conclusions, run several seeds or use the repository benchmark script: python benchmarks/benchmark_search_methods.py --runs 3.