Plotting the Search
This notebook walks through the plotting helpers in sklearn-genetic-opt. It shows the default fitness view, richer history and logbook views, and two search-space plot styles that make it easier to understand how the optimizer explored the parameter space.
[1]:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn_genetic import GASearchCV
from sklearn_genetic.plots import plot_fitness_evolution, plot_history, plot_search_space
from sklearn_genetic.space import Categorical, Continuous, Integer
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
search = GASearchCV(
DecisionTreeRegressor(random_state=42),
cv=2,
scoring='r2',
population_size=4,
generations=5,
tournament_size=3,
elitism=True,
crossover_probability=0.9,
mutation_probability=0.05,
param_grid={
'ccp_alpha': Continuous(0, 1),
'criterion': Categorical(['squared_error', 'absolute_error']),
'max_depth': Integer(2, 20),
'min_samples_split': Integer(2, 30),
},
criteria='max',
n_jobs=1,
)
search.fit(X_train, y_train)
gen evals avg best div unique stag mut sel events
---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------
0 4 0.14741 0.30181 0.833 1.000 0 - - -
1 8 0.20587 0.30181 0.583 1.000 1 0.050 3 -
2 8 0.25733 0.30181 0.583 0.750 2 0.050 3 dup=3
3 8 0.27799 0.30181 0.333 0.750 3 0.050 3 dup=3
4 8 0.26608 0.30181 0.250 0.500 4 0.050 3 dup=6
5 8 0.26157 0.30181 0.417 0.750 5 0.050 3 dup=6
[1]:
GASearchCV(crossover_probability=0.9, cv=2,
estimator=DecisionTreeRegressor(ccp_alpha=0.19121381309034374,
max_depth=2, min_samples_split=10,
random_state=42),
generations=5, mutation_probability=0.05, n_jobs=1,
param_grid={'ccp_alpha': <sklearn_genetic.space.space.Continuous object at 0x0000022C95420EC0>,
'criterion': <sklearn_genetic.space.space.Categorical object at 0x0000022C95421940>,
'max_depth': <sklearn_genetic.space.space.Integer object at 0x0000022C95421A90>,
'min_samples_split': <sklearn_genetic.space.space.Integer object at 0x0000022C954507D0>},
population_size=4, return_train_score=True, scoring='r2')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
| estimator | DecisionTreeR...ndom_state=42) | |
| cv | 2 | |
| param_grid | {'ccp_alpha': <sklearn_gene...0022C95420EC0>, 'criterion': <sklearn_gene...0022C95421940>, 'max_depth': <sklearn_gene...0022C95421A90>, 'min_samples_split': <sklearn_gene...0022C954507D0>} | |
| scoring | 'r2' | |
| population_size | 4 | |
| generations | 5 | |
| crossover_probability | 0.9 | |
| mutation_probability | 0.05 | |
| n_jobs | 1 | |
| return_train_score | True | |
| tournament_size | 3 | |
| elitism | True | |
| verbose | True | |
| keep_top_k | 1 | |
| criteria | 'max' | |
| algorithm | 'eaMuPlusLambda' | |
| refit | True | |
| pre_dispatch | '2*n_jobs' | |
| error_score | nan | |
| log_config | None | |
| use_cache | True | |
| warm_start_configs | None | |
| evolution_config | None | |
| population_config | None | |
| runtime_config | None | |
| optimization_config | None | |
| parallel_backend | 'auto' | |
| population_initializer | 'smart' | |
| local_search | False | |
| local_search_top_k | 1 | |
| local_search_steps | 1 | |
| local_search_radius | 0.1 | |
| diversity_control | True | |
| diversity_threshold | 0.25 | |
| diversity_stagnation_generations | 5 | |
| diversity_mutation_boost | 2.0 | |
| random_immigrants_fraction | 0.1 | |
| adaptive_selection | False | |
| selection_pressure_min | 2 | |
| selection_pressure_max | None | |
| offspring_diversity_retries | 0 | |
| fitness_sharing | False | |
| sharing_radius | 0.2 | |
| sharing_alpha | 1.0 | |
| final_selection | False | |
| final_selection_top_k | 3 | |
| final_selection_cv | None |
Fitted attributes
| Name | Type | Value |
|---|---|---|
| X_ | ndarray[float64](296, 10) | [[ 0.01,-0.04,-0.03,...,-0. , 0.01, 0.01], [-0. ,-0.04, 0.05,..., 0.08, 0.08, 0.05], [ 0.01, 0.05,-0.01,..., 0.07, 0.04, 0.02], ..., [ 0.03,-0.04,-0.02,...,-0.04,-0.01,-0. ], [-0.01,-0.04,-0.02,...,-0. ,-0.04,-0.04], [-0.09,-0.04, 0.03,...,-0.04,-0.01,-0. ]] |
| best_estimator_ | DecisionTreeRegressor | DecisionTreeR...ndom_state=42) |
| best_index_ | int | 0 |
| best_params_ | dict | {'cc...ha': 0.19121381309034374, 'cr...on': 'sq...or', 'ma...th': 2, 'mi...it': 10} |
| best_score_ | float | 0.3018 |
| cv_results_ | dict | {'me...me': [np.float64(0....6529083251953), np.float64(0....3884506225586), np.float64(0....9275588989258), np.float64(0....9248733520508), ...], 'me...me': [np.float64(0....8908386230469), np.float64(0....0425109863281), np.float64(0....6118392944336), np.float64(0....9141006469727), ...], 'me...re': [np.float64(0.3018126431428106), np.float64(0....2439934859743), np.float64(0....5429826555526), np.float64(0....5928029179815), ...], 'me...re': [np.float64(0....3243791290727), np.float64(0.4981076194721688), np.float64(0.9428519269019859), np.float64(0.738897812386645), ...], ...} |
| estimator_ | DecisionTreeRegressor | DecisionTreeR...ndom_state=42) |
| final_selection_results_ | dict | {'ca...es': [], 'changed': False, 'cv': None, 'enabled': False, ...} |
| fit_stats_ | dict | {'ca...ts': 1, 'cr...ls': 43, 'du...es': 0, 'ev...es': 44, ...} |
| multimetric_ | bool | False |
| n_features_in_ | int | 10 |
| n_splits_ | int | 2 |
| refit_time_ | float | 0.001968 |
| scorer_ | _Scorer | make_scorer(r...hod='predict') |
| y_ | ndarray[float64](296,) | [154.,192.,116.,...,148., 64.,302.] |
DecisionTreeRegressor(ccp_alpha=0.19121381309034374, max_depth=2,
min_samples_split=10, random_state=42)Parameters
Fitted attributes
History and telemetry
The fitted estimator stores generation-level telemetry in history. The newer fields track the adaptive behavior of the optimizer, including mutation pressure, diversity control, duplicate handling, and local refinement.
[2]:
history = pd.DataFrame(search.history)
telemetry_columns = [
'gen',
'fitness_best',
'fitness',
'fitness_max',
'unique_individual_ratio',
'genotype_diversity',
'mutation_probability',
'selection_pressure',
'random_immigrants',
'duplicate_replacements',
'local_refinements',
]
history[[column for column in telemetry_columns if column in history.columns]].tail()
[2]:
| gen | fitness_best | fitness | fitness_max | unique_individual_ratio | genotype_diversity | mutation_probability | selection_pressure | random_immigrants | duplicate_replacements | local_refinements | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0.301813 | 0.205867 | 0.301813 | 1.00 | 0.583333 | 0.05 | 3.0 | 0 | 0 | 0 |
| 2 | 2 | 0.301813 | 0.257327 | 0.301813 | 0.75 | 0.583333 | 0.05 | 3.0 | 0 | 3 | 0 |
| 3 | 3 | 0.301813 | 0.277993 | 0.301813 | 0.75 | 0.333333 | 0.05 | 3.0 | 0 | 3 | 0 |
| 4 | 4 | 0.301813 | 0.266084 | 0.301813 | 0.50 | 0.250000 | 0.05 | 3.0 | 0 | 6 | 0 |
| 5 | 5 | 0.301813 | 0.261574 | 0.301813 | 0.75 | 0.416667 | 0.05 | 3.0 | 0 | 6 | 0 |
Fitness plots
plot_fitness_evolution accepts multiple metrics and an optional rolling window, which makes it easier to compare best-so-far fitness with the current population.
[3]:
plot_fitness_evolution(search)
plt.show()
[4]:
plot_fitness_evolution(
search,
metrics=['fitness_best', 'fitness', 'fitness_max'],
window=2,
kind='line',
title='Fitness comparison with smoothing',
)
plt.show()
History plots
plot_history can plot any fields from history or logbook. Use it to inspect fitness signals, diversity indicators, or optimizer-control events.
[5]:
plot_history(
search,
fields=['fitness_best', 'fitness', 'unique_individual_ratio', 'genotype_diversity'],
kind='line',
subplots=True,
title='Optimizer history overview',
)
plt.show()
[6]:
plot_history(
search,
fields=['score', 'fit_time', 'score_time'],
source='logbook',
kind='area',
title='Logbook fields from candidate evaluations',
)
plt.show()
Search-space plots
The search-space plots now have clearer modes. The pair plot shows relationships between sampled parameters, while the heatmap gives a quick correlation view.
[7]:
plot_search_space(
search,
features=['ccp_alpha', 'max_depth', 'min_samples_split', 'criterion'],
hue='criterion',
kind='pair',
)
plt.show()
[8]:
plot_search_space(
search,
features=['ccp_alpha', 'max_depth', 'min_samples_split'],
kind='heatmap',
)
plt.show()
Takeaways
Use
plot_fitness_evolutionwhen you want a compact fitness trend.Use
plot_historywhen you want to inspect telemetry signals like diversity, stagnation, and control events.Use
plot_search_spacewhen you want to understand how the sampled parameter values relate to each other or to the final score.