Release Notes

Some notes on new features in various releases

What’s new in 0.13.0

Breaking Changes:

  • The default crossover_probability has changed from 0.2 to 0.8 and the default mutation_probability has changed from 0.8 to 0.1 for both GASearchCV and GAFeatureSelectionCV.

  • diversity_control now defaults to True and diversity_threshold now defaults to 0.25 for both GASearchCV and GAFeatureSelectionCV. Previously diversity_control defaulted to False and diversity_threshold defaulted to 0.1. Code that does not set these explicitly will now run with diversity monitoring enabled; set diversity_control=False to restore the previous behavior.

  • The fitness function for GASearchCV is now single-objective (CV score only). Previously, a novelty_score based on population Hamming distance was included as a second fitness objective with equal weight. This caused Pareto-dominance comparisons during tournament selection to favor diverse-but-lower-scoring candidates over better candidates, systematically reducing search quality. Fitness sharing (fitness_sharing=True) and diversity control (diversity_control=True) already provide population diversity maintenance without corrupting the primary fitness signal. GAFeatureSelectionCV retains its two-objective fitness (CV score + feature count) unchanged.

Features:

  • Improved the performance of GASearchCV and GAFeatureSelectionCV during fit. Candidate evaluations are now de-duplicated within each generation, and unique candidates can be evaluated in parallel through n_jobs. When generation-level parallelism is active, each candidate runs cross-validation sequentially to avoid nested parallelism.

  • Added parallel_backend to GASearchCV and GAFeatureSelectionCV to compare 'auto', 'population', and 'cv' parallel strategies.

  • Added fit_stats_ to GASearchCV and GAFeatureSelectionCV with counters for evaluated candidates, unique candidates, cross-validation calls, cache hits, duplicate candidates, skipped invalid candidates, and population-level parallel batches.

  • Added optimizer telemetry to history and the generation logbook for GASearchCV and GAFeatureSelectionCV. New fields track population diversity, unique individual ratios, current-generation fitness, cumulative best-so-far fitness through fitness_best, best-solution improvement, the first generation where the current best solution appeared, and stagnation length.

  • Expanded plots with clearer and more flexible plotting helpers. plot_fitness_evolution now supports multiple metrics and smoothing, plot_history can visualize arbitrary telemetry fields from history or logbook, and plot_search_space now includes a pair-plot mode and a correlation heatmap for sampled parameters.

  • Improved verbose fit output so real-time progress is shown as a compact generation summary with fitness, diversity, uniqueness, stagnation, mutation probability, and optimizer-control events. Full telemetry remains available through history and the generation logbook.

  • Added population_initializer to GASearchCV and GAFeatureSelectionCV. The default 'smart' strategy improves the initial population with valid warm starts, estimator defaults, Latin hypercube sampling for numeric hyperparameters, stratified categorical values, and duplicate-aware feature masks. Set population_initializer='random' to use the previous random initialization behavior.

  • Added optional local refinement and diversity-control mechanisms to GASearchCV and GAFeatureSelectionCV. local_search=True runs a short neighborhood search around hall-of-fame candidates after the genetic search. diversity_control=True monitors diversity and stagnation to boost mutation, replace duplicate candidates, and inject random immigrants when the population collapses too early.

  • Added optional fitness sharing with fitness_sharing=True. During selection, candidates in crowded niches receive a temporary shared fitness penalty based on normalized candidate distance, helping multiple promising regions survive longer without changing raw cross-validation scores.

  • Added optional adaptive tournament selection and diversity-aware offspring generation to GASearchCV and GAFeatureSelectionCV. adaptive_selection=True reduces selection pressure when diversity is low or the search stagnates and can increase pressure when the population is improving. New offspring_diversity_retries retry logic helps replace duplicate or parent-matching offspring with novel candidates.

  • Added grouped configuration objects: EvolutionConfig, PopulationConfig, RuntimeConfig, and OptimizationConfig. These objects provide the preferred API for advanced optimizer settings while the previous flat keyword parameters remain supported for backward compatibility.

  • Added optional robust final selection to GASearchCV. With final_selection=True, the search re-evaluates the top final_selection_top_k candidates after the GA and selects the final best_params_ from those scores before refitting. Results are stored in final_selection_results_.

  • Added benchmarks/benchmark_fit.py to measure fit-time mechanics, compare baseline JSON results against current runs, compare parallel strategies, and track population initializers, optimizer telemetry, and holdout model metrics across classification and regression scenarios.

  • Added benchmarks/benchmark_search_methods.py to compare GASearchCV against scikit-learn hyperparameter search methods, including GridSearchCV, RandomizedSearchCV, HalvingGridSearchCV, and HalvingRandomSearchCV. The benchmark reports solution time, evaluated candidates, estimated cross-validation effort, best CV score, holdout metrics, and best parameters. The default --n-iter for random and halving searches is now automatically computed from the GA’s total candidate generation budget (population_size + generations * 2 * population_size) so that all methods are compared with equivalent evaluation slots. Pass --n-iter N to override.

  • GASearchCV now uses uniform crossover (cxUniform with indpb=0.5) instead of two-point crossover (cxTwoPoint) for hyperparameter search spaces with two or more parameters. Uniform crossover independently swaps each parameter between parents with 50% probability, which is more effective than two-point crossover for the short, mixed-type parameter lists typical in hyperparameter tuning.

  • GAFeatureSelectionCV now repairs feature masks created by initialization, crossover, mutation, duplicate replacement, random immigrants, and direct evaluation so masks keep at least one selected feature and respect max_features.

Docs:

  • Refreshed the Jupyter notebook tutorials with richer end-to-end workflows, top-level index menus, holdout metric reporting, optimizer telemetry, and examples using the current optimizer controls.

  • Added a checkpointing and persistence notebook covering ModelCheckpoint and fitted estimator save/load workflows.

  • Added a plotting gallery notebook demonstrating the full plotting surface for fitness evolution, optimizer telemetry, logbook data, and search-space inspection.

  • Simplified the project README and documentation index to emphasize core usage, smart initialization, optimizer controls, persistence, benchmarks, and notebook examples.

  • Documented conda installation through the conda-forge channel and added a conda.recipe with a noarch: python recipe. The recipe declares only the core runtime dependencies, fixing the unsolvable environments caused by pinning the optional seaborn, mlflow, tensorflow, and tensorboard extras as hard requirements.

  • Added When to Use sklearn-genetic-opt tutorial with a decision table comparing Grid, Randomized, and GA search, practical signs that GA helps or does not help, a seven-parameter HistGradientBoostingClassifier worked example that illustrates the learning_rate × max_iter interaction, and a minimum recommended configuration snippet.

  • Added Pipeline Tuning with GASearchCV tutorial covering the step__param double-underscore naming convention, a full six-parameter GradientBoostingRegressor pipeline example, warm-start seed configuration, baseline-versus-tuned comparison, and common pitfalls including wrong step names, negative scorers, and nested parallelism.

  • Added Troubleshooting page with Q&A entries covering parameter name errors, all-same-score causes including error_score masking, slow search and nested parallelism, premature convergence and diversity diagnosis, fit_stats_ field-by-field explanations, reproducibility, warm-start validation, multi-metric confusion, and plot dependency issues.

  • Added Multi-Metric Hyperparameter Search tutorial explaining multi-scorer scoring dicts, the refit metric, cv_results_ column layout for multi-metric results, per-metric best-config queries, and how to build alternative estimators from cv_results_ without rerunning the search.

  • Expanded How to Use sklearn-genetic-opt with descriptions of the compact verbose log columns added in 0.13.0: div (genotype diversity), unique (unique individual ratio), stag (stagnation generations), and events (optimizer intervention summary). Also added a history DataFrame access snippet and field-by-field fit_stats_ commentary.

  • Reordered the documentation index toctree to start with When to Use sklearn-genetic-opt before How to Use sklearn-genetic-opt, and replaced the short Iris quick-start example on the homepage with a six-parameter RandomForestClassifier example on the breast cancer dataset using roc_auc and a holdout evaluation. Added a Recommended Next Steps section linking to the new tutorials.

Bug Fixes:

  • Fixed fitted estimator persistence for GASearchCV and GAFeatureSelectionCV by excluding volatile DEAP runtime objects from the saved state.

  • Fixed type preservation for GASearchCV hyperparameter candidates. Integer, continuous, and categorical dimensions are repaired against their declared search-space types after initialization, warm starts, crossover, mutation, random immigrants, duplicate replacement, local search, and before evaluation.

  • Fixed smart feature-selection initialization so fallback masks used to fill duplicate populations respect max_features and always select at least one feature.

  • Fixed convergence telemetry so local refinement updates the final generation history row and the default fitness plot shows fitness_best rather than a noisy population average.

What’s new in 0.12.0

Features:

  • Added compatibility for outlier detection algorithms

What’s new in 0.11.1

Bug Fixes:

  • Fixed a bug that would generate AttributeError: ‘GASearchCV’ object has no attribute ‘creator’

What’s new in 0.11.0

Features:

  • Added a parameter use_cache, which defaults to True. When enabled, the algorithm will skip re-evaluating solutions that have already been evaluated, retrieving the performance metrics from the cache instead. If use_cache is set to False, the algorithm will always re-evaluate solutions, even if they have been seen before, to obtain fresh performance metrics.

  • Add a parameter in GAFeatureSelectionCV named warm_start_configs, defaults to None, a list of predefined hyperparameter configurations to seed the initial population. Each element in the list is a dictionary where the keys are the names of the hyperparameters, and the values are the corresponding hyperparameter values to be used for the individual.

    Example:

    1warm_start_configs = [
    2       {"min_weight_fraction_leaf": 0.02, "bootstrap": True, "max_depth": None, "n_estimators": 100},
    3       {"min_weight_fraction_leaf": 0.4, "bootstrap": True, "max_depth": 5, "n_estimators": 200},
    4]
    

    The genetic algorithm will initialize part of the population with these configurations to warm-start the optimization process. The remaining individuals in the population will be initialized randomly according to the defined hyperparameter space.

    This parameter is useful when prior knowledge of good hyperparameter configurations exists, allowing the algorithm to focus on refining known good solutions while still exploring new areas of the hyperparameter space. If set to None, the entire population will be initialized randomly.

  • Introduced a novelty search strategy to the GASearchCV class. This strategy rewards solutions that are more distinct from others in the population by incorporating a novelty score into the fitness evaluation. The novelty score encourages exploration and promotes diversity, reducing the risk of premature convergence to local optima.

    • Novelty Score: Calculated based on the distance between an individual and its nearest neighbors in the population. Individuals with higher novelty scores are more distinct from the rest of the population.

    • Fitness Evaluation: The overall fitness is now a combination of the traditional performance score and the novelty score, allowing the algorithm to balance between exploiting known good solutions and exploring new, diverse ones.

    • Improved Exploration: This strategy helps explore new areas of the hyperparameter space, increasing the likelihood of discovering better solutions and avoiding local optima.

API Changes:

  • Dropped support for python 3.8

What’s new in 0.10.1

Features:

  • Install tensorflow when use pip install sklearn-genetic-opt[all]

Bug Fixes:

  • Fixed a bug that wouldn’t allow to clone the GA classes when used inside a pipeline

What’s new in 0.10.0

API Changes:

  • GAFeatureSelectionCV now mimics the scikit-learn FeatureSelection algorithms API instead of Grid Search, this enables easier implementation as a selection method that is closer to the scikit-learn API

  • Improved GAFeatureSelectionCV candidate generation when max_features is set, it also ensures there is at least one feature selected

  • crossover_probability and mutation_probability are now correctly passed to the mate and mutation functions inside GAFeatureSelectionCV

  • Dropped support for python 3.7 and add support for python 3.10+

  • Update most important packages from dev-requirements.txt to more recent versions

  • Update deprecated functions in tests

Bug Fixes:

What’s new in 0.9.0

Features:

  • Introducing Adaptive Schedulers to enable adaptive mutation and crossover probabilities; currently, supported schedulers are:

  • Add random_state parameter (default= None) in Continuous, Categorical and Integer classes to leave fixed the random seed during hyperparameters sampling. Take into account that this only ensures that the space components are reproducible, not all the package. This is due to the DEAP dependency, which doesn’t seem to have a native way to set the random seed.

API Changes:

  • Changed the default values of mutation_probability and crossover_probability to 0.8 and 0.2, respectively.

  • The weighted_choice function used in GAFeatureSelectionCV was re-written to give more probability to a number of features closer to the max_features parameter

  • Removed unused and wrong function plot_parallel_coordinates()

Bug Fixes:

  • Now when using the plot_search_space() function, all the parameters get casted as np.float64 to avoid errors on seaborn package while plotting bool values.

What’s new in 0.8.1

Features:

  • If the max_features parameter from GAFeatureSelectionCV is set, the initial population is now sampled giving more probability to solutions with less than max_features features.

What’s new in 0.8.0

Features:

  • GAFeatureSelectionCV now has a parameter called max_features, int, default=None. If it’s not None, it will penalize individuals with more features than max_features, putting a “soft” upper bound to the number of features to be selected.

  • Classes GASearchCV and GAFeatureSelectionCV now support multi-metric evaluation the same way scikit-learn does, you will see this reflected on the logbook and cv_results_ objects, where now you get results for each metric. As in scikit-learn, if multi-metric is used, the refit parameter must be a str specifying the metric to evaluate the cv-scores. See more in the GASearchCV and GAFeatureSelectionCV API documentation.

  • Training gracefully stops if interrupted by some of these exceptions: KeyboardInterrupt, SystemExit, StopIteration. When one of these exceptions is raised, the model finishes the current generation and saves the current best model. It only works if at least one generation has been completed.

API Changes:

  • The following parameters changed their default values to create more extensive and different models with better results:

    • population_size from 10 to 50

    • generations from 40 to 80

    • mutation_probability from 0.1 to 0.2

Docs:

  • A new notebook called Iris_multimetric was added to showcase the new multi-metric capabilities.

What’s new in 0.7.0

Features:

  • GAFeatureSelectionCV for feature selection along with any scikit-learn classifier or regressor. It optimizes the cv-score while minimizing the number of features to select. This class is compatible with the mlflow and tensorboard integration, the Callbacks and the plot_fitness_evolution function.

API Changes:

  • The module mlflow was renamed to mlflow_log to avoid unexpected errors on name resolutions

What’s new in 0.6.1

Features:

  • Added the parameter generations to the DeltaThreshold. Now it compares the maximum and minimum values of a metric from the last generations, instead of just the current and previous ones. The default value is 2, so the behavior remains the same as in previous versions.

Bug Fixes:

  • When a param_grid of length 1 is provided, a user warning is raised instead of an error. Internally it will swap the crossover operation to use the DEAP’s cxSimulatedBinaryBounded().

  • When using Continuous class with boundaries lower and upper, a uniform distribution with limits [lower, lower + upper] was sampled, now, it’s properly sampled using a [lower, upper] limits.

What’s new in 0.6.0

Features:

  • Added the ProgressBar callback, it uses tqdm progress bar to shows how many generations are left in the training progress.

  • Added the TensorBoard callback to log the generation metrics, watch in real time while the models are trained and compare different runs in your TensorBoard instance.

  • Added the TimerStopping callback to stop the iterations after a total (threshold) fitting time has been elapsed.

  • Added new parallel coordinates plot in plot_parallel_coordinates().

  • Now if one or more callbacks decides to stop the algorithm, it will print its class name to know which callbacks were responsible of the stopping.

  • Added support for extra methods coming from scikit-learn’s BaseSearchCV, like cv_results_, best_index_ and refit_time_ among others.

  • Added methods on_start and on_end to BaseCallback. Now the algorithms check for the callbacks like this:

    • on_start: When the evolutionary algorithm is called from the GASearchCV.fit method.

    • on_step: When the evolutionary algorithm finishes a generation (no change here).

    • on_end: At the end of the last generation.

Bug Fixes:

  • A missing statement was making that the callbacks start to get evaluated from generation 1, ignoring generation 0. Now this is properly handled and callbacks work from generation 0.

API Changes:

  • The modules plots and MLflowConfig now requires an explicit installation of seaborn and mlflow, now those are optionally installed using pip install sklearn-genetic-opt[all].

  • The GASearchCV.logbook property now has extra information that comes from the scikit-learn cross_validate function.

  • An optional extra parameter was added to GASearchCV, named return_train_score: bool, default= False. As in scikit-learn, it controls if the cv_results_ should have the training scores.

Docs:

  • Edited all demos to be in the jupyter notebook format.

  • Added embedded jupyter notebooks examples.

  • The modules of the package now have a summary of their classes/functions in the docs.

  • Updated the callbacks and custom callbacks tutorials to add new TensorBoard callback and the new methods on the base callback.

Internal:

  • Now the hof uses the self.best_params_ for the position 0, to be consistent with the scikit-learn API and parameters like self.best_index_

What’s new in 0.5.0

Features:

  • Build-in integration with MLflow using the MLflowConfig and the new parameter log_config from GASearchCV

  • Implemented the callback LogbookSaver which saves the estimator.logbook object with all the fitted hyperparameters and their cross-validation score

  • Added the parameter estimator to all the functions on the module callbacks

Docs:

  • Added user guide “Integrating with MLflow”

  • Update the tutorial “Custom Callbacks” for new API inheritance behavior

Internal:

  • Added a base class BaseCallback from which all Callbacks must inherit from

  • Now coverage report doesn’t take into account the lines with # pragma: no cover and # noqa

What’s new in 0.4.1

Docs:

  • Added user guide on “Understanding the evaluation process”

  • Several guides on contributing, code of conduct

  • Added important links

  • Docs requirements are now independent of package requirements

Internal:

  • Changed test ci from travis to Github actions

What’s new in 0.4

Features:

  • Implemented callbacks module to stop the optimization process based in the current iteration metrics, currently implemented: ThresholdStopping , ConsecutiveStopping and DeltaThreshold.

  • The algorithms ‘eaSimple’, ‘eaMuPlusLambda’, ‘eaMuCommaLambda’ are now implemented in the module algorithms for more control over their options, rather that taking the deap.algorithms module

  • Implemented the plots module and added the function plot_search_space(), this function plots a mixed counter, scatter and histogram plots over all the fitted hyperparameters and their cross-validation score

  • Documentation based in rst with Sphinx to host in read the docs. It includes public classes and functions documentation as well as several tutorials on how to use the package

  • Added best_params_ and best_estimator_ properties after fitting GASearchCV

  • Added optional parameters refit, pre_dispatch and error_score

API Changes:

  • Removed support for python 3.6, changed the libraries supported versions to be the same as scikit-learn current version

  • Several internal changes on the documentation and variables naming style to be compatible with Sphinx

  • Removed the parameters continuous_parameters, categorical_parameters and integer_parameters replacing them with param_grid

What’s new in 0.3

Features:

  • Added the space module to control better the data types and ranges of each hyperparameter, their distribution to sample random values from, and merge all data types in one Space class that can work with the new param_grid parameter

  • Changed the continuous_parameters, categorical_parameters and integer_parameters for the param_grid, the first ones still work but will be removed in a next version

  • Added the option to use the eaMuCommaLambda algorithm from deap

  • The mu and lambda_ parameters of the internal eaMuPlusLambda and eaMuCommaLambda now are in terms of the initial population size and not the number of generations

What’s new in 0.2

Features:

  • Enabled deap’s eaMuPlusLambda algorithm for the optimization process, now is the default routine

  • Added a logbook and history properties to the fitted GASearchCV to make post-fit analysis

  • Elitism=False now implements a roulette selection instead of ignoring the parameter

  • Added the parameter keep_top_k to control the number of solutions if the hall of fame (hof)

API Changes:

  • Refactored the optimization algorithm to use DEAP package instead of a custom implementation, this causes the removal of several methods, properties and variables inside the GASearchCV class

  • The parameter encoding_length has been removed, it’s no longer required to the GASearchCV class

  • Renamed the property of the fitted estimator from best_params_ to best_params

  • The verbosity now prints the deap log of the fitness function, it’s standard deviation, max and min values from each generation

  • The variable GASearchCV._best_solutions was removed and it’s meant to be replaced with GASearchCV.logbook and GASearchCV.history

  • Changed default parameters crossover_probability from 1 to 0.8 and generations from 50 to 40

What’s new in 0.1

Features:

  • GASearchCV for hyperparameters tuning using custom genetic algorithm for scikit-learn classification and regression models

  • plot_fitness_evolution() function to see the average fitness values over generations