Comparing GASearchCV With sklearn Search Methods

This notebook compares GASearchCV with RandomizedSearchCV and GridSearchCV on the same classification problem. The goal is not to declare one method universally best; it is to show how to compare solution quality, search cost, and runtime fairly.

Problem Setup

We use the breast cancer binary classification dataset and a scaled logistic-regression pipeline. The search space includes continuous and categorical choices, which makes it a good small example for comparing search methods.

[1]:

import time
import warnings

import numpy as np
import pandas as pd
from scipy.stats import loguniform
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, roc_auc_score
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, StratifiedKFold, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklearn_genetic import (
    EvolutionConfig,
    GASearchCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Continuous

warnings.filterwarnings("ignore", category=UserWarning)

RANDOM_STATE = 42

[2]:

data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.30,
    stratify=y,
    random_state=RANDOM_STATE,
)
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)

Shared Model and Metrics

Each method receives the same estimator family and the same train/test split. We report both cross-validation score and holdout metrics.

[3]:

def make_model():
    return Pipeline(
        [
            ("scaler", StandardScaler()),
            (
                "logistic",
                LogisticRegression(
                    solver="liblinear",
                    max_iter=500,
                    random_state=RANDOM_STATE,
                ),
            ),
        ]
    )


def evaluate_classifier(estimator):
    predictions = estimator.predict(X_test)
    probabilities = estimator.predict_proba(X_test)[:, 1]
    return {
        "accuracy": accuracy_score(y_test, predictions),
        "balanced_accuracy": balanced_accuracy_score(y_test, predictions),
        "f1": f1_score(y_test, predictions),
        "roc_auc": roc_auc_score(y_test, probabilities),
    }


def summarize_search(name, estimator, fit_seconds):
    cv_results = getattr(estimator, "cv_results_", {})
    evaluated_candidates = len(cv_results.get("params", []))
    row = {
        "method": name,
        "fit_seconds": fit_seconds,
        "evaluated_candidates": evaluated_candidates,
        "estimated_cv_evaluations": evaluated_candidates * cv.get_n_splits(),
        "best_cv_score": getattr(estimator, "best_score_", None),
    }
    row.update(evaluate_classifier(estimator))
    return row

Run RandomizedSearchCV

Random search samples a fixed number of candidates. It is often a strong baseline for continuous spaces.

[4]:

randomized_search = RandomizedSearchCV(
    estimator=make_model(),
    param_distributions={
        "logistic__C": loguniform(1e-3, 30.0),
        "logistic__class_weight": [None, "balanced"],
    },
    n_iter=16,
    scoring="roc_auc",
    cv=cv,
    n_jobs=-1,
    random_state=RANDOM_STATE,
    refit=True,
)

started_at = time.perf_counter()
randomized_search.fit(X_train, y_train)
randomized_seconds = time.perf_counter() - started_at

Run GridSearchCV

Grid search is deterministic and easy to reason about. It becomes expensive when every additional dimension multiplies the candidate count.

[5]:

grid_search = GridSearchCV(
    estimator=make_model(),
    param_grid={
        "logistic__C": np.geomspace(1e-3, 30.0, num=8),
        "logistic__class_weight": [None, "balanced"],
    },
    scoring="roc_auc",
    cv=cv,
    n_jobs=-1,
    refit=True,
)

started_at = time.perf_counter()
grid_search.fit(X_train, y_train)
grid_seconds = time.perf_counter() - started_at

Run GASearchCV

The GA version uses the same parameter region with sklearn-genetic-opt spaces and enables optimizer controls that are useful in mixed search spaces.

[6]:

ga_search = GASearchCV(
    estimator=make_model(),
    param_grid={
        "logistic__C": Continuous(1e-3, 30.0, distribution="log-uniform"),
        "logistic__class_weight": Categorical([None, "balanced"]),
    },
    scoring="roc_auc",
    cv=cv,
    evolution_config=EvolutionConfig(
        population_size=10,
        generations=8,
        crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),
        mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),
        tournament_size=3,
        elitism=True,
        keep_top_k=3,
    ),
    population_config=PopulationConfig(
        initializer="smart",
        warm_start_configs=[{"logistic__C": 1.0, "logistic__class_weight": None}],
    ),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True, verbose=True),
    optimization_config=OptimizationConfig(
        local_search=True,
        local_search_top_k=2,
        local_search_steps=1,
        diversity_control=True,
        random_immigrants_fraction=0.10,
        fitness_sharing=True,
    ),
)

callbacks = [
    DeltaThreshold(threshold=0.0005, generations=5, metric="fitness_best"),
    ConsecutiveStopping(generations=7, metric="fitness_best"),
    TimerStopping(total_seconds=90),
]

started_at = time.perf_counter()
ga_search.fit(X_train, y_train, callbacks=callbacks)
ga_seconds = time.perf_counter() - started_at

 gen evals           avg          best     div  unique  stag     mut   sel             events
---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------
   0    10       0.99336       0.99452   0.556   1.000     0       -     - -
   1    20       0.99354       0.99452   0.389   0.800     1   0.200     3 dup=12,share
   2    20       0.99375       0.99452   0.389   0.700     2   0.216     3 dup=15,share
   3    20       0.99331       0.99452   0.500   0.900     3   0.193     3 dup=14,share
   4    20       0.99337       0.99452   0.389   0.800     4   0.177     3 dup=14,share
INFO: DeltaThreshold callback met its criteria
INFO: Stopping the algorithm

Compare Results

Candidate budgets are not exactly identical, so the table includes evaluated candidates and estimated CV evaluations. Use this context when comparing runtime.

[7]:

comparison = pd.DataFrame(
    [
        summarize_search("RandomizedSearchCV", randomized_search, randomized_seconds),
        summarize_search("GridSearchCV", grid_search, grid_seconds),
        summarize_search("GASearchCV", ga_search, ga_seconds),
    ]
).sort_values("roc_auc", ascending=False)

comparison

[7]:

	method	fit_seconds	evaluated_candidates	estimated_cv_evaluations	best_cv_score	accuracy	balanced_accuracy	f1	roc_auc
0	RandomizedSearchCV	21.342722	16	48	0.994902	0.982456	0.979702	0.986047	0.996641
1	GridSearchCV	0.335785	16	48	0.994745	0.982456	0.979702	0.986047	0.996641
2	GASearchCV	11.507412	92	276	0.994904	0.982456	0.979702	0.986047	0.996641

Read GA-Specific Telemetry

The sklearn searchers expose cv_results_. GASearchCV also exposes fit_stats_ and history, which help explain search behavior.

[8]:

ga_search.fit_stats_

[8]:

{'evaluated_candidates': 92,
 'unique_candidates': 87,
 'cross_validate_calls': 87,
 'cache_hits': 5,
 'duplicate_candidates': 0,
 'skipped_invalid_candidates': 0,
 'population_parallel_batches': 6,
 'population_serial_batches': 0,
 'random_immigrants': 0,
 'local_refinement_candidates': 2}

[9]:

history = pd.DataFrame(ga_search.history)
history[[
    "gen",
    "fitness",
    "fitness_max",
    "unique_individual_ratio",
    "genotype_diversity",
    "stagnation_generations",
]].tail()

[9]:

	gen	fitness	fitness_max	unique_individual_ratio	genotype_diversity	stagnation_generations
0	0	0.993359	0.994519	1.0	0.555556	0
1	1	0.993536	0.994502	0.8	0.388889	1
2	2	0.993755	0.993880	0.7	0.388889	2
3	3	0.993312	0.994198	0.9	0.500000	3
4	4	0.993955	0.994656	0.9	0.444444	0

Practical Notes

Compare methods using both quality metrics and search cost.
RandomizedSearchCV is a strong baseline for continuous spaces.
GridSearchCV is useful when the grid is small and deliberately chosen.
GASearchCV becomes more attractive as the space gets mixed, conditional, rugged, or expensive enough that smarter exploration matters.
For repeatable conclusions, run several seeds or use the repository benchmark script: python benchmarks/benchmark_search_methods.py --runs 3.