Release Notes
Some notes on new features in various releases
What’s new in 0.13.0
Breaking Changes:
The default
crossover_probabilityhas changed from0.2to0.8and the defaultmutation_probabilityhas changed from0.8to0.1for bothGASearchCVandGAFeatureSelectionCV.diversity_controlnow defaults toTrueanddiversity_thresholdnow defaults to0.25for bothGASearchCVandGAFeatureSelectionCV. Previouslydiversity_controldefaulted toFalseanddiversity_thresholddefaulted to0.1. Code that does not set these explicitly will now run with diversity monitoring enabled; setdiversity_control=Falseto restore the previous behavior.The fitness function for
GASearchCVis now single-objective (CV score only). Previously, anovelty_scorebased on population Hamming distance was included as a second fitness objective with equal weight. This caused Pareto-dominance comparisons during tournament selection to favor diverse-but-lower-scoring candidates over better candidates, systematically reducing search quality. Fitness sharing (fitness_sharing=True) and diversity control (diversity_control=True) already provide population diversity maintenance without corrupting the primary fitness signal.GAFeatureSelectionCVretains its two-objective fitness (CV score + feature count) unchanged.
Features:
Improved the performance of
GASearchCVandGAFeatureSelectionCVduringfit. Candidate evaluations are now de-duplicated within each generation, and unique candidates can be evaluated in parallel throughn_jobs. When generation-level parallelism is active, each candidate runs cross-validation sequentially to avoid nested parallelism.Added
parallel_backendtoGASearchCVandGAFeatureSelectionCVto compare'auto','population', and'cv'parallel strategies.Added
fit_stats_toGASearchCVandGAFeatureSelectionCVwith counters for evaluated candidates, unique candidates, cross-validation calls, cache hits, duplicate candidates, skipped invalid candidates, and population-level parallel batches.Added optimizer telemetry to
historyand the generation logbook forGASearchCVandGAFeatureSelectionCV. New fields track population diversity, unique individual ratios, current-generation fitness, cumulative best-so-far fitness throughfitness_best, best-solution improvement, the first generation where the current best solution appeared, and stagnation length.Expanded
plotswith clearer and more flexible plotting helpers.plot_fitness_evolutionnow supports multiple metrics and smoothing,plot_historycan visualize arbitrary telemetry fields fromhistoryorlogbook, andplot_search_spacenow includes a pair-plot mode and a correlation heatmap for sampled parameters.Improved verbose fit output so real-time progress is shown as a compact generation summary with fitness, diversity, uniqueness, stagnation, mutation probability, and optimizer-control events. Full telemetry remains available through
historyand the generation logbook.Added
population_initializertoGASearchCVandGAFeatureSelectionCV. The default'smart'strategy improves the initial population with valid warm starts, estimator defaults, Latin hypercube sampling for numeric hyperparameters, stratified categorical values, and duplicate-aware feature masks. Setpopulation_initializer='random'to use the previous random initialization behavior.Added optional local refinement and diversity-control mechanisms to
GASearchCVandGAFeatureSelectionCV.local_search=Trueruns a short neighborhood search around hall-of-fame candidates after the genetic search.diversity_control=Truemonitors diversity and stagnation to boost mutation, replace duplicate candidates, and inject random immigrants when the population collapses too early.Added optional fitness sharing with
fitness_sharing=True. During selection, candidates in crowded niches receive a temporary shared fitness penalty based on normalized candidate distance, helping multiple promising regions survive longer without changing raw cross-validation scores.Added optional adaptive tournament selection and diversity-aware offspring generation to
GASearchCVandGAFeatureSelectionCV.adaptive_selection=Truereduces selection pressure when diversity is low or the search stagnates and can increase pressure when the population is improving. Newoffspring_diversity_retriesretry logic helps replace duplicate or parent-matching offspring with novel candidates.Added grouped configuration objects:
EvolutionConfig,PopulationConfig,RuntimeConfig, andOptimizationConfig. These objects provide the preferred API for advanced optimizer settings while the previous flat keyword parameters remain supported for backward compatibility.Added optional robust final selection to
GASearchCV. Withfinal_selection=True, the search re-evaluates the topfinal_selection_top_kcandidates after the GA and selects the finalbest_params_from those scores before refitting. Results are stored infinal_selection_results_.Added
benchmarks/benchmark_fit.pyto measure fit-time mechanics, compare baseline JSON results against current runs, compare parallel strategies, and track population initializers, optimizer telemetry, and holdout model metrics across classification and regression scenarios.Added
benchmarks/benchmark_search_methods.pyto compareGASearchCVagainst scikit-learn hyperparameter search methods, includingGridSearchCV,RandomizedSearchCV,HalvingGridSearchCV, andHalvingRandomSearchCV. The benchmark reports solution time, evaluated candidates, estimated cross-validation effort, best CV score, holdout metrics, and best parameters. The default--n-iterfor random and halving searches is now automatically computed from the GA’s total candidate generation budget (population_size + generations * 2 * population_size) so that all methods are compared with equivalent evaluation slots. Pass--n-iter Nto override.GASearchCVnow uses uniform crossover (cxUniformwithindpb=0.5) instead of two-point crossover (cxTwoPoint) for hyperparameter search spaces with two or more parameters. Uniform crossover independently swaps each parameter between parents with 50% probability, which is more effective than two-point crossover for the short, mixed-type parameter lists typical in hyperparameter tuning.GAFeatureSelectionCVnow repairs feature masks created by initialization, crossover, mutation, duplicate replacement, random immigrants, and direct evaluation so masks keep at least one selected feature and respectmax_features.
Docs:
Refreshed the Jupyter notebook tutorials with richer end-to-end workflows, top-level index menus, holdout metric reporting, optimizer telemetry, and examples using the current optimizer controls.
Added a checkpointing and persistence notebook covering
ModelCheckpointand fitted estimatorsave/loadworkflows.Added a plotting gallery notebook demonstrating the full plotting surface for fitness evolution, optimizer telemetry, logbook data, and search-space inspection.
Simplified the project README and documentation index to emphasize core usage, smart initialization, optimizer controls, persistence, benchmarks, and notebook examples.
Documented conda installation through the conda-forge channel and added a
conda.recipewith anoarch: pythonrecipe. The recipe declares only the core runtime dependencies, fixing the unsolvable environments caused by pinning the optionalseaborn,mlflow,tensorflow, andtensorboardextras as hard requirements.Added When to Use sklearn-genetic-opt tutorial with a decision table comparing Grid, Randomized, and GA search, practical signs that GA helps or does not help, a seven-parameter
HistGradientBoostingClassifierworked example that illustrates thelearning_rate × max_iterinteraction, and a minimum recommended configuration snippet.Added Pipeline Tuning with GASearchCV tutorial covering the
step__paramdouble-underscore naming convention, a full six-parameterGradientBoostingRegressorpipeline example, warm-start seed configuration, baseline-versus-tuned comparison, and common pitfalls including wrong step names, negative scorers, and nested parallelism.Added Troubleshooting page with Q&A entries covering parameter name errors, all-same-score causes including
error_scoremasking, slow search and nested parallelism, premature convergence and diversity diagnosis,fit_stats_field-by-field explanations, reproducibility, warm-start validation, multi-metric confusion, and plot dependency issues.Added Multi-Metric Hyperparameter Search tutorial explaining multi-scorer
scoringdicts, therefitmetric,cv_results_column layout for multi-metric results, per-metric best-config queries, and how to build alternative estimators fromcv_results_without rerunning the search.Expanded How to Use sklearn-genetic-opt with descriptions of the compact verbose log columns added in 0.13.0:
div(genotype diversity),unique(unique individual ratio),stag(stagnation generations), andevents(optimizer intervention summary). Also added ahistoryDataFrame access snippet and field-by-fieldfit_stats_commentary.Reordered the documentation index toctree to start with When to Use sklearn-genetic-opt before How to Use sklearn-genetic-opt, and replaced the short Iris quick-start example on the homepage with a six-parameter
RandomForestClassifierexample on the breast cancer dataset usingroc_aucand a holdout evaluation. Added a Recommended Next Steps section linking to the new tutorials.
Bug Fixes:
Fixed fitted estimator persistence for
GASearchCVandGAFeatureSelectionCVby excluding volatile DEAP runtime objects from the saved state.Fixed type preservation for
GASearchCVhyperparameter candidates. Integer, continuous, and categorical dimensions are repaired against their declared search-space types after initialization, warm starts, crossover, mutation, random immigrants, duplicate replacement, local search, and before evaluation.Fixed smart feature-selection initialization so fallback masks used to fill duplicate populations respect
max_featuresand always select at least one feature.Fixed convergence telemetry so local refinement updates the final generation history row and the default fitness plot shows
fitness_bestrather than a noisy population average.
What’s new in 0.12.0
Features:
Added compatibility for outlier detection algorithms
What’s new in 0.11.1
Bug Fixes:
Fixed a bug that would generate AttributeError: ‘GASearchCV’ object has no attribute ‘creator’
What’s new in 0.11.0
Features:
Added a parameter use_cache, which defaults to
True. When enabled, the algorithm will skip re-evaluating solutions that have already been evaluated, retrieving the performance metrics from the cache instead. If use_cache is set toFalse, the algorithm will always re-evaluate solutions, even if they have been seen before, to obtain fresh performance metrics.Add a parameter in GAFeatureSelectionCV named warm_start_configs, defaults to
None, a list of predefined hyperparameter configurations to seed the initial population. Each element in the list is a dictionary where the keys are the names of the hyperparameters, and the values are the corresponding hyperparameter values to be used for the individual.Example:
1warm_start_configs = [ 2 {"min_weight_fraction_leaf": 0.02, "bootstrap": True, "max_depth": None, "n_estimators": 100}, 3 {"min_weight_fraction_leaf": 0.4, "bootstrap": True, "max_depth": 5, "n_estimators": 200}, 4]
The genetic algorithm will initialize part of the population with these configurations to warm-start the optimization process. The remaining individuals in the population will be initialized randomly according to the defined hyperparameter space.
This parameter is useful when prior knowledge of good hyperparameter configurations exists, allowing the algorithm to focus on refining known good solutions while still exploring new areas of the hyperparameter space. If set to
None, the entire population will be initialized randomly.Introduced a novelty search strategy to the GASearchCV class. This strategy rewards solutions that are more distinct from others in the population by incorporating a novelty score into the fitness evaluation. The novelty score encourages exploration and promotes diversity, reducing the risk of premature convergence to local optima.
Novelty Score: Calculated based on the distance between an individual and its nearest neighbors in the population. Individuals with higher novelty scores are more distinct from the rest of the population.
Fitness Evaluation: The overall fitness is now a combination of the traditional performance score and the novelty score, allowing the algorithm to balance between exploiting known good solutions and exploring new, diverse ones.
Improved Exploration: This strategy helps explore new areas of the hyperparameter space, increasing the likelihood of discovering better solutions and avoiding local optima.
API Changes:
Dropped support for python 3.8
What’s new in 0.10.1
Features:
Install tensorflow when use
pip install sklearn-genetic-opt[all]
Bug Fixes:
Fixed a bug that wouldn’t allow to clone the GA classes when used inside a pipeline
What’s new in 0.10.0
API Changes:
GAFeatureSelectionCV now mimics the scikit-learn FeatureSelection algorithms API instead of Grid Search, this enables easier implementation as a selection method that is closer to the scikit-learn API
Improved GAFeatureSelectionCV candidate generation when max_features is set, it also ensures there is at least one feature selected
crossover_probability and mutation_probability are now correctly passed to the mate and mutation functions inside GAFeatureSelectionCV
Dropped support for python 3.7 and add support for python 3.10+
Update most important packages from dev-requirements.txt to more recent versions
Update deprecated functions in tests
Bug Fixes:
Fixed the API docs of
GAFeatureSelectionCV, it was pointing to the wrong class
What’s new in 0.9.0
Features:
Introducing Adaptive Schedulers to enable adaptive mutation and crossover probabilities; currently, supported schedulers are:
Add random_state parameter (default=
None) inContinuous,CategoricalandIntegerclasses to leave fixed the random seed during hyperparameters sampling. Take into account that this only ensures that the space components are reproducible, not all the package. This is due to the DEAP dependency, which doesn’t seem to have a native way to set the random seed.
API Changes:
Changed the default values of mutation_probability and crossover_probability to 0.8 and 0.2, respectively.
The weighted_choice function used in
GAFeatureSelectionCVwas re-written to give more probability to a number of features closer to the max_features parameterRemoved unused and wrong function
plot_parallel_coordinates()
Bug Fixes:
Now when using the
plot_search_space()function, all the parameters get casted as np.float64 to avoid errors on seaborn package while plotting bool values.
What’s new in 0.8.1
Features:
If the max_features parameter from
GAFeatureSelectionCVis set, the initial population is now sampled giving more probability to solutions with less than max_features features.
What’s new in 0.8.0
Features:
GAFeatureSelectionCVnow has a parameter called max_features, int, default=None. If it’s not None, it will penalize individuals with more features than max_features, putting a “soft” upper bound to the number of features to be selected.Classes
GASearchCVandGAFeatureSelectionCVnow support multi-metric evaluation the same way scikit-learn does, you will see this reflected on the logbook and cv_results_ objects, where now you get results for each metric. As in scikit-learn, if multi-metric is used, the refit parameter must be a str specifying the metric to evaluate the cv-scores. See more in theGASearchCVandGAFeatureSelectionCVAPI documentation.Training gracefully stops if interrupted by some of these exceptions:
KeyboardInterrupt,SystemExit,StopIteration. When one of these exceptions is raised, the model finishes the current generation and saves the current best model. It only works if at least one generation has been completed.
API Changes:
The following parameters changed their default values to create more extensive and different models with better results:
population_size from 10 to 50
generations from 40 to 80
mutation_probability from 0.1 to 0.2
Docs:
A new notebook called Iris_multimetric was added to showcase the new multi-metric capabilities.
What’s new in 0.7.0
Features:
GAFeatureSelectionCVfor feature selection along with any scikit-learn classifier or regressor. It optimizes the cv-score while minimizing the number of features to select. This class is compatible with the mlflow and tensorboard integration, the Callbacks and theplot_fitness_evolutionfunction.
API Changes:
The module
mlflowwas renamed tomlflow_logto avoid unexpected errors on name resolutions
What’s new in 0.6.1
Features:
Added the parameter generations to the
DeltaThreshold. Now it compares the maximum and minimum values of a metric from the last generations, instead of just the current and previous ones. The default value is 2, so the behavior remains the same as in previous versions.
Bug Fixes:
When a param_grid of length 1 is provided, a user warning is raised instead of an error. Internally it will swap the crossover operation to use the DEAP’s
cxSimulatedBinaryBounded().When using
Continuousclass with boundaries lower and upper, a uniform distribution with limits [lower, lower + upper] was sampled, now, it’s properly sampled using a [lower, upper] limits.
What’s new in 0.6.0
Features:
Added the
ProgressBarcallback, it uses tqdm progress bar to shows how many generations are left in the training progress.Added the
TensorBoardcallback to log the generation metrics, watch in real time while the models are trained and compare different runs in your TensorBoard instance.Added the
TimerStoppingcallback to stop the iterations after a total (threshold) fitting time has been elapsed.Added new parallel coordinates plot in
plot_parallel_coordinates().Now if one or more callbacks decides to stop the algorithm, it will print its class name to know which callbacks were responsible of the stopping.
Added support for extra methods coming from scikit-learn’s BaseSearchCV, like cv_results_, best_index_ and refit_time_ among others.
Added methods on_start and on_end to
BaseCallback. Now the algorithms check for the callbacks like this:on_start: When the evolutionary algorithm is called from the GASearchCV.fit method.
on_step: When the evolutionary algorithm finishes a generation (no change here).
on_end: At the end of the last generation.
Bug Fixes:
A missing statement was making that the callbacks start to get evaluated from generation 1, ignoring generation 0. Now this is properly handled and callbacks work from generation 0.
API Changes:
The modules
plotsandMLflowConfignow requires an explicit installation of seaborn and mlflow, now those are optionally installed usingpip install sklearn-genetic-opt[all].The GASearchCV.logbook property now has extra information that comes from the scikit-learn cross_validate function.
An optional extra parameter was added to GASearchCV, named return_train_score: bool, default=
False. As in scikit-learn, it controls if the cv_results_ should have the training scores.
Docs:
Edited all demos to be in the jupyter notebook format.
Added embedded jupyter notebooks examples.
The modules of the package now have a summary of their classes/functions in the docs.
Updated the callbacks and custom callbacks tutorials to add new TensorBoard callback and the new methods on the base callback.
Internal:
Now the hof uses the self.best_params_ for the position 0, to be consistent with the scikit-learn API and parameters like self.best_index_
What’s new in 0.5.0
Features:
Build-in integration with MLflow using the
MLflowConfigand the new parameter log_config fromGASearchCVImplemented the callback
LogbookSaverwhich saves the estimator.logbook object with all the fitted hyperparameters and their cross-validation scoreAdded the parameter estimator to all the functions on the module
callbacks
Docs:
Added user guide “Integrating with MLflow”
Update the tutorial “Custom Callbacks” for new API inheritance behavior
Internal:
Added a base class
BaseCallbackfrom which all Callbacks must inherit fromNow coverage report doesn’t take into account the lines with # pragma: no cover and # noqa
What’s new in 0.4.1
Docs:
Added user guide on “Understanding the evaluation process”
Several guides on contributing, code of conduct
Added important links
Docs requirements are now independent of package requirements
Internal:
Changed test ci from travis to Github actions
What’s new in 0.4
Features:
Implemented callbacks module to stop the optimization process based in the current iteration metrics, currently implemented:
ThresholdStopping,ConsecutiveStoppingandDeltaThreshold.The algorithms ‘eaSimple’, ‘eaMuPlusLambda’, ‘eaMuCommaLambda’ are now implemented in the module
algorithmsfor more control over their options, rather that taking the deap.algorithms moduleImplemented the
plotsmodule and added the functionplot_search_space(), this function plots a mixed counter, scatter and histogram plots over all the fitted hyperparameters and their cross-validation scoreDocumentation based in rst with Sphinx to host in read the docs. It includes public classes and functions documentation as well as several tutorials on how to use the package
Added best_params_ and best_estimator_ properties after fitting GASearchCV
Added optional parameters refit, pre_dispatch and error_score
API Changes:
Removed support for python 3.6, changed the libraries supported versions to be the same as scikit-learn current version
Several internal changes on the documentation and variables naming style to be compatible with Sphinx
Removed the parameters continuous_parameters, categorical_parameters and integer_parameters replacing them with param_grid
What’s new in 0.3
Features:
Added the space module to control better the data types and ranges of each hyperparameter, their distribution to sample random values from, and merge all data types in one Space class that can work with the new param_grid parameter
Changed the continuous_parameters, categorical_parameters and integer_parameters for the param_grid, the first ones still work but will be removed in a next version
Added the option to use the eaMuCommaLambda algorithm from deap
The mu and lambda_ parameters of the internal eaMuPlusLambda and eaMuCommaLambda now are in terms of the initial population size and not the number of generations
What’s new in 0.2
Features:
Enabled deap’s eaMuPlusLambda algorithm for the optimization process, now is the default routine
Added a logbook and history properties to the fitted GASearchCV to make post-fit analysis
Elitism=Falsenow implements a roulette selection instead of ignoring the parameterAdded the parameter keep_top_k to control the number of solutions if the hall of fame (hof)
API Changes:
Refactored the optimization algorithm to use DEAP package instead of a custom implementation, this causes the removal of several methods, properties and variables inside the GASearchCV class
The parameter encoding_length has been removed, it’s no longer required to the GASearchCV class
Renamed the property of the fitted estimator from best_params_ to best_params
The verbosity now prints the deap log of the fitness function, it’s standard deviation, max and min values from each generation
The variable GASearchCV._best_solutions was removed and it’s meant to be replaced with GASearchCV.logbook and GASearchCV.history
Changed default parameters crossover_probability from 1 to 0.8 and generations from 50 to 40
What’s new in 0.1
Features:
GASearchCVfor hyperparameters tuning using custom genetic algorithm for scikit-learn classification and regression modelsplot_fitness_evolution()function to see the average fitness values over generations