{
"cells": [
{
"cell_type": "markdown",
"id": "e6a05a04",
"metadata": {},
"source": [
"# Comparing GASearchCV With sklearn Search Methods\n",
"\n",
"This notebook compares `GASearchCV` with `RandomizedSearchCV` and `GridSearchCV` on the same classification problem. The goal is not to declare one method universally best; it is to show how to compare solution quality, search cost, and runtime fairly.\n",
"\n",
"## Menu\n",
"\n",
"1. [Problem Setup](#problem-setup)\n",
"2. [Shared Model and Metrics](#shared-model-and-metrics)\n",
"3. [Run RandomizedSearchCV](#run-randomizedsearchcv)\n",
"4. [Run GridSearchCV](#run-gridsearchcv)\n",
"5. [Run GASearchCV](#run-gasearchcv)\n",
"6. [Compare Results](#compare-results)\n",
"7. [Read GA-Specific Telemetry](#read-ga-specific-telemetry)\n",
"8. [Practical Notes](#practical-notes)"
]
},
{
"cell_type": "markdown",
"id": "530cd97a",
"metadata": {},
"source": [
"## Problem Setup\n",
"\n",
"We use the breast cancer binary classification dataset and a scaled logistic-regression pipeline. The search space includes continuous and categorical choices, which makes it a good small example for comparing search methods."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7827ec84",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:52:47.341232Z",
"iopub.status.busy": "2026-06-21T20:52:47.340740Z",
"iopub.status.idle": "2026-06-21T20:53:25.353904Z",
"shell.execute_reply": "2026-06-21T20:53:25.352586Z"
}
},
"outputs": [],
"source": [
"import time\n",
"import warnings\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"from scipy.stats import loguniform\n",
"from sklearn.datasets import load_breast_cancer\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, roc_auc_score\n",
"from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, StratifiedKFold, train_test_split\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"from sklearn_genetic import (\n",
" EvolutionConfig,\n",
" GASearchCV,\n",
" OptimizationConfig,\n",
" PopulationConfig,\n",
" RuntimeConfig,\n",
")\n",
"from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping\n",
"from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter\n",
"from sklearn_genetic.space import Categorical, Continuous\n",
"\n",
"warnings.filterwarnings(\"ignore\", category=UserWarning)\n",
"\n",
"RANDOM_STATE = 42"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fa83fb4e",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:25.358170Z",
"iopub.status.busy": "2026-06-21T20:53:25.357345Z",
"iopub.status.idle": "2026-06-21T20:53:25.490764Z",
"shell.execute_reply": "2026-06-21T20:53:25.489161Z"
}
},
"outputs": [],
"source": [
"data = load_breast_cancer(as_frame=True)\n",
"X = data.data\n",
"y = data.target\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" X,\n",
" y,\n",
" test_size=0.30,\n",
" stratify=y,\n",
" random_state=RANDOM_STATE,\n",
")\n",
"cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)"
]
},
{
"cell_type": "markdown",
"id": "886fd073",
"metadata": {},
"source": [
"## Shared Model and Metrics\n",
"\n",
"Each method receives the same estimator family and the same train/test split. We report both cross-validation score and holdout metrics."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0eb11223",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:25.495443Z",
"iopub.status.busy": "2026-06-21T20:53:25.494884Z",
"iopub.status.idle": "2026-06-21T20:53:25.505646Z",
"shell.execute_reply": "2026-06-21T20:53:25.504130Z"
}
},
"outputs": [],
"source": [
"def make_model():\n",
" return Pipeline(\n",
" [\n",
" (\"scaler\", StandardScaler()),\n",
" (\n",
" \"logistic\",\n",
" LogisticRegression(\n",
" solver=\"liblinear\",\n",
" max_iter=500,\n",
" random_state=RANDOM_STATE,\n",
" ),\n",
" ),\n",
" ]\n",
" )\n",
"\n",
"\n",
"def evaluate_classifier(estimator):\n",
" predictions = estimator.predict(X_test)\n",
" probabilities = estimator.predict_proba(X_test)[:, 1]\n",
" return {\n",
" \"accuracy\": accuracy_score(y_test, predictions),\n",
" \"balanced_accuracy\": balanced_accuracy_score(y_test, predictions),\n",
" \"f1\": f1_score(y_test, predictions),\n",
" \"roc_auc\": roc_auc_score(y_test, probabilities),\n",
" }\n",
"\n",
"\n",
"def summarize_search(name, estimator, fit_seconds):\n",
" cv_results = getattr(estimator, \"cv_results_\", {})\n",
" evaluated_candidates = len(cv_results.get(\"params\", []))\n",
" row = {\n",
" \"method\": name,\n",
" \"fit_seconds\": fit_seconds,\n",
" \"evaluated_candidates\": evaluated_candidates,\n",
" \"estimated_cv_evaluations\": evaluated_candidates * cv.get_n_splits(),\n",
" \"best_cv_score\": getattr(estimator, \"best_score_\", None),\n",
" }\n",
" row.update(evaluate_classifier(estimator))\n",
" return row"
]
},
{
"cell_type": "markdown",
"id": "1ad0b454",
"metadata": {},
"source": [
"## Run RandomizedSearchCV\n",
"\n",
"Random search samples a fixed number of candidates. It is often a strong baseline for continuous spaces."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "02b20778",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:25.509464Z",
"iopub.status.busy": "2026-06-21T20:53:25.508980Z",
"iopub.status.idle": "2026-06-21T20:53:46.863958Z",
"shell.execute_reply": "2026-06-21T20:53:46.859548Z"
}
},
"outputs": [],
"source": [
"randomized_search = RandomizedSearchCV(\n",
" estimator=make_model(),\n",
" param_distributions={\n",
" \"logistic__C\": loguniform(1e-3, 30.0),\n",
" \"logistic__class_weight\": [None, \"balanced\"],\n",
" },\n",
" n_iter=16,\n",
" scoring=\"roc_auc\",\n",
" cv=cv,\n",
" n_jobs=-1,\n",
" random_state=RANDOM_STATE,\n",
" refit=True,\n",
")\n",
"\n",
"started_at = time.perf_counter()\n",
"randomized_search.fit(X_train, y_train)\n",
"randomized_seconds = time.perf_counter() - started_at"
]
},
{
"cell_type": "markdown",
"id": "e48aab9e",
"metadata": {},
"source": [
"## Run GridSearchCV\n",
"\n",
"Grid search is deterministic and easy to reason about. It becomes expensive when every additional dimension multiplies the candidate count."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ec5731a1",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:46.871493Z",
"iopub.status.busy": "2026-06-21T20:53:46.870236Z",
"iopub.status.idle": "2026-06-21T20:53:47.219549Z",
"shell.execute_reply": "2026-06-21T20:53:47.217243Z"
}
},
"outputs": [],
"source": [
"grid_search = GridSearchCV(\n",
" estimator=make_model(),\n",
" param_grid={\n",
" \"logistic__C\": np.geomspace(1e-3, 30.0, num=8),\n",
" \"logistic__class_weight\": [None, \"balanced\"],\n",
" },\n",
" scoring=\"roc_auc\",\n",
" cv=cv,\n",
" n_jobs=-1,\n",
" refit=True,\n",
")\n",
"\n",
"started_at = time.perf_counter()\n",
"grid_search.fit(X_train, y_train)\n",
"grid_seconds = time.perf_counter() - started_at"
]
},
{
"cell_type": "markdown",
"id": "4da5dc51",
"metadata": {},
"source": [
"## Run GASearchCV\n",
"\n",
"The GA version uses the same parameter region with `sklearn-genetic-opt` spaces and enables optimizer controls that are useful in mixed search spaces."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0e6f8a9b",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:47.225924Z",
"iopub.status.busy": "2026-06-21T20:53:47.225045Z",
"iopub.status.idle": "2026-06-21T20:53:58.748735Z",
"shell.execute_reply": "2026-06-21T20:53:58.745203Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" gen evals avg best div unique stag mut sel events\n",
"---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------\n",
" 0 10 0.99336 0.99452 0.556 1.000 0 - - - \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1 20 0.99354 0.99452 0.389 0.800 1 0.200 3 dup=12,share \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2 20 0.99375 0.99452 0.389 0.700 2 0.216 3 dup=15,share \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3 20 0.99331 0.99452 0.500 0.900 3 0.193 3 dup=14,share \n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" 4 20 0.99337 0.99452 0.389 0.800 4 0.177 3 dup=14,share \n",
"INFO: DeltaThreshold callback met its criteria\n",
"INFO: Stopping the algorithm\n"
]
}
],
"source": [
"ga_search = GASearchCV(\n",
" estimator=make_model(),\n",
" param_grid={\n",
" \"logistic__C\": Continuous(1e-3, 30.0, distribution=\"log-uniform\"),\n",
" \"logistic__class_weight\": Categorical([None, \"balanced\"]),\n",
" },\n",
" scoring=\"roc_auc\",\n",
" cv=cv,\n",
" evolution_config=EvolutionConfig(\n",
" population_size=10,\n",
" generations=8,\n",
" crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),\n",
" mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),\n",
" tournament_size=3,\n",
" elitism=True,\n",
" keep_top_k=3,\n",
" ),\n",
" population_config=PopulationConfig(\n",
" initializer=\"smart\",\n",
" warm_start_configs=[{\"logistic__C\": 1.0, \"logistic__class_weight\": None}],\n",
" ),\n",
" runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend=\"auto\", use_cache=True, verbose=True),\n",
" optimization_config=OptimizationConfig(\n",
" local_search=True,\n",
" local_search_top_k=2,\n",
" local_search_steps=1,\n",
" diversity_control=True,\n",
" random_immigrants_fraction=0.10,\n",
" fitness_sharing=True,\n",
" ),\n",
")\n",
"\n",
"callbacks = [\n",
" DeltaThreshold(threshold=0.0005, generations=5, metric=\"fitness_best\"),\n",
" ConsecutiveStopping(generations=7, metric=\"fitness_best\"),\n",
" TimerStopping(total_seconds=90),\n",
"]\n",
"\n",
"started_at = time.perf_counter()\n",
"ga_search.fit(X_train, y_train, callbacks=callbacks)\n",
"ga_seconds = time.perf_counter() - started_at\n"
]
},
{
"cell_type": "markdown",
"id": "736afa7e",
"metadata": {},
"source": [
"## Compare Results\n",
"\n",
"Candidate budgets are not exactly identical, so the table includes evaluated candidates and estimated CV evaluations. Use this context when comparing runtime."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3597347c",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:58.756735Z",
"iopub.status.busy": "2026-06-21T20:53:58.755327Z",
"iopub.status.idle": "2026-06-21T20:53:58.976426Z",
"shell.execute_reply": "2026-06-21T20:53:58.973452Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" method | \n",
" fit_seconds | \n",
" evaluated_candidates | \n",
" estimated_cv_evaluations | \n",
" best_cv_score | \n",
" accuracy | \n",
" balanced_accuracy | \n",
" f1 | \n",
" roc_auc | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" RandomizedSearchCV | \n",
" 21.342722 | \n",
" 16 | \n",
" 48 | \n",
" 0.994902 | \n",
" 0.982456 | \n",
" 0.979702 | \n",
" 0.986047 | \n",
" 0.996641 | \n",
"
\n",
" \n",
" | 1 | \n",
" GridSearchCV | \n",
" 0.335785 | \n",
" 16 | \n",
" 48 | \n",
" 0.994745 | \n",
" 0.982456 | \n",
" 0.979702 | \n",
" 0.986047 | \n",
" 0.996641 | \n",
"
\n",
" \n",
" | 2 | \n",
" GASearchCV | \n",
" 11.507412 | \n",
" 92 | \n",
" 276 | \n",
" 0.994904 | \n",
" 0.982456 | \n",
" 0.979702 | \n",
" 0.986047 | \n",
" 0.996641 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" method fit_seconds evaluated_candidates \\\n",
"0 RandomizedSearchCV 21.342722 16 \n",
"1 GridSearchCV 0.335785 16 \n",
"2 GASearchCV 11.507412 92 \n",
"\n",
" estimated_cv_evaluations best_cv_score accuracy balanced_accuracy \\\n",
"0 48 0.994902 0.982456 0.979702 \n",
"1 48 0.994745 0.982456 0.979702 \n",
"2 276 0.994904 0.982456 0.979702 \n",
"\n",
" f1 roc_auc \n",
"0 0.986047 0.996641 \n",
"1 0.986047 0.996641 \n",
"2 0.986047 0.996641 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"comparison = pd.DataFrame(\n",
" [\n",
" summarize_search(\"RandomizedSearchCV\", randomized_search, randomized_seconds),\n",
" summarize_search(\"GridSearchCV\", grid_search, grid_seconds),\n",
" summarize_search(\"GASearchCV\", ga_search, ga_seconds),\n",
" ]\n",
").sort_values(\"roc_auc\", ascending=False)\n",
"\n",
"comparison"
]
},
{
"cell_type": "markdown",
"id": "76d9768d",
"metadata": {},
"source": [
"## Read GA-Specific Telemetry\n",
"\n",
"The sklearn searchers expose `cv_results_`. `GASearchCV` also exposes `fit_stats_` and `history`, which help explain search behavior."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5b927c51",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:58.984294Z",
"iopub.status.busy": "2026-06-21T20:53:58.983283Z",
"iopub.status.idle": "2026-06-21T20:53:59.001766Z",
"shell.execute_reply": "2026-06-21T20:53:58.998538Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'evaluated_candidates': 92,\n",
" 'unique_candidates': 87,\n",
" 'cross_validate_calls': 87,\n",
" 'cache_hits': 5,\n",
" 'duplicate_candidates': 0,\n",
" 'skipped_invalid_candidates': 0,\n",
" 'population_parallel_batches': 6,\n",
" 'population_serial_batches': 0,\n",
" 'random_immigrants': 0,\n",
" 'local_refinement_candidates': 2}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ga_search.fit_stats_"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8088b2d1",
"metadata": {
"execution": {
"iopub.execute_input": "2026-06-21T20:53:59.007642Z",
"iopub.status.busy": "2026-06-21T20:53:59.006881Z",
"iopub.status.idle": "2026-06-21T20:53:59.055246Z",
"shell.execute_reply": "2026-06-21T20:53:59.053145Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" gen | \n",
" fitness | \n",
" fitness_max | \n",
" unique_individual_ratio | \n",
" genotype_diversity | \n",
" stagnation_generations | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0 | \n",
" 0.993359 | \n",
" 0.994519 | \n",
" 1.0 | \n",
" 0.555556 | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1 | \n",
" 0.993536 | \n",
" 0.994502 | \n",
" 0.8 | \n",
" 0.388889 | \n",
" 1 | \n",
"
\n",
" \n",
" | 2 | \n",
" 2 | \n",
" 0.993755 | \n",
" 0.993880 | \n",
" 0.7 | \n",
" 0.388889 | \n",
" 2 | \n",
"
\n",
" \n",
" | 3 | \n",
" 3 | \n",
" 0.993312 | \n",
" 0.994198 | \n",
" 0.9 | \n",
" 0.500000 | \n",
" 3 | \n",
"
\n",
" \n",
" | 4 | \n",
" 4 | \n",
" 0.993955 | \n",
" 0.994656 | \n",
" 0.9 | \n",
" 0.444444 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" gen fitness fitness_max unique_individual_ratio genotype_diversity \\\n",
"0 0 0.993359 0.994519 1.0 0.555556 \n",
"1 1 0.993536 0.994502 0.8 0.388889 \n",
"2 2 0.993755 0.993880 0.7 0.388889 \n",
"3 3 0.993312 0.994198 0.9 0.500000 \n",
"4 4 0.993955 0.994656 0.9 0.444444 \n",
"\n",
" stagnation_generations \n",
"0 0 \n",
"1 1 \n",
"2 2 \n",
"3 3 \n",
"4 0 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"history = pd.DataFrame(ga_search.history)\n",
"history[[\n",
" \"gen\",\n",
" \"fitness\",\n",
" \"fitness_max\",\n",
" \"unique_individual_ratio\",\n",
" \"genotype_diversity\",\n",
" \"stagnation_generations\",\n",
"]].tail()"
]
},
{
"cell_type": "markdown",
"id": "55a965fb",
"metadata": {},
"source": [
"## Practical Notes\n",
"\n",
"- Compare methods using both quality metrics and search cost.\n",
"- `RandomizedSearchCV` is a strong baseline for continuous spaces.\n",
"- `GridSearchCV` is useful when the grid is small and deliberately chosen.\n",
"- `GASearchCV` becomes more attractive as the space gets mixed, conditional, rugged, or expensive enough that smarter exploration matters.\n",
"- For repeatable conclusions, run several seeds or use the repository benchmark script: `python benchmarks/benchmark_search_methods.py --runs 3`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.14"
}
},
"nbformat": 4,
"nbformat_minor": 5
}