Release Notes

Some notes on new features in various releases

What’s new in 0.10.1

Features:

Install tensorflow when use pip install sklearn-genetic-opt[all]

Bug Fixes:

Fixed a bug that wouldn’t allow to clone the GA classes when used inside a pipeline

What’s new in 0.10.0

API Changes:

GAFeatureSelectionCV now mimics the scikit-learn FeatureSelection algorithms API instead of Grid Search, this enables easier implementation as a selection method that is closer to the scikit-learn API
Improved GAFeatureSelectionCV candidate generation when max_features is set, it also ensures there is at least one feature selected
crossover_probability and mutation_probability are now correctly passed to the mate and mutation functions inside GAFeatureSelectionCV
Dropped support for python 3.7 and add support for python 3.10+
Update most important packages from dev-requirements.txt to more recent versions
Update deprecated functions in tests

Bug Fixes:

Fixed the API docs of GAFeatureSelectionCV, it was pointing to the wrong class

What’s new in 0.9.0

Features:

Introducing Adaptive Schedulers to enable adaptive mutation and crossover probabilities; currently, supported schedulers are:
Add random_state parameter (default= None) in Continuous, Categorical and Integer classes to leave fixed the random seed during hyperparameters sampling. Take into account that this only ensures that the space components are reproducible, not all the package. This is due to the DEAP dependency, which doesn’t seem to have a native way to set the random seed.

API Changes:

Changed the default values of mutation_probability and crossover_probability to 0.8 and 0.2, respectively.
The weighted_choice function used in GAFeatureSelectionCV was re-written to give more probability to a number of features closer to the max_features parameter
Removed unused and wrong function plot_parallel_coordinates()

Bug Fixes:

Now when using the plot_search_space() function, all the parameters get casted as np.float64 to avoid errors on seaborn package while plotting bool values.

What’s new in 0.8.1

Features:

If the max_features parameter from GAFeatureSelectionCV is set, the initial population is now sampled giving more probability to solutions with less than max_features features.

What’s new in 0.8.0

Features:

GAFeatureSelectionCV now has a parameter called max_features, int, default=None. If it’s not None, it will penalize individuals with more features than max_features, putting a “soft” upper bound to the number of features to be selected.
Classes GASearchCV and GAFeatureSelectionCV now support multi-metric evaluation the same way scikit-learn does, you will see this reflected on the logbook and cv_results_ objects, where now you get results for each metric. As in scikit-learn, if multi-metric is used, the refit parameter must be a str specifying the metric to evaluate the cv-scores. See more in the GASearchCV and GAFeatureSelectionCV API documentation.
Training gracefully stops if interrupted by some of these exceptions: KeyboardInterrupt, SystemExit, StopIteration. When one of these exceptions is raised, the model finishes the current generation and saves the current best model. It only works if at least one generation has been completed.

API Changes:

The following parameters changed their default values to create more extensive and different models with better results:
- population_size from 10 to 50
- generations from 40 to 80
- mutation_probability from 0.1 to 0.2

Docs:

A new notebook called Iris_multimetric was added to showcase the new multi-metric capabilities.

What’s new in 0.7.0

Features:

GAFeatureSelectionCV for feature selection along with any scikit-learn classifier or regressor. It optimizes the cv-score while minimizing the number of features to select. This class is compatible with the mlflow and tensorboard integration, the Callbacks and the plot_fitness_evolution function.

API Changes:

The module mlflow was renamed to mlflow_log to avoid unexpected errors on name resolutions

What’s new in 0.6.1

Features:

Added the parameter generations to the DeltaThreshold. Now it compares the maximum and minimum values of a metric from the last generations, instead of just the current and previous ones. The default value is 2, so the behavior remains the same as in previous versions.

Bug Fixes:

When a param_grid of length 1 is provided, a user warning is raised instead of an error. Internally it will swap the crossover operation to use the DEAP’s cxSimulatedBinaryBounded().
When using Continuous class with boundaries lower and upper, a uniform distribution with limits [lower, lower + upper] was sampled, now, it’s properly sampled using a [lower, upper] limits.

What’s new in 0.6.0

Features:

Added the ProgressBar callback, it uses tqdm progress bar to shows how many generations are left in the training progress.
Added the TensorBoard callback to log the generation metrics, watch in real time while the models are trained and compare different runs in your TensorBoard instance.
Added the TimerStopping callback to stop the iterations after a total (threshold) fitting time has been elapsed.
Added new parallel coordinates plot in plot_parallel_coordinates().
Now if one or more callbacks decides to stop the algorithm, it will print its class name to know which callbacks were responsible of the stopping.
Added support for extra methods coming from scikit-learn’s BaseSearchCV, like cv_results_, best_index_ and refit_time_ among others.
Added methods on_start and on_end to BaseCallback. Now the algorithms check for the callbacks like this:
- on_start: When the evolutionary algorithm is called from the GASearchCV.fit method.
- on_step: When the evolutionary algorithm finishes a generation (no change here).
- on_end: At the end of the last generation.

Bug Fixes:

A missing statement was making that the callbacks start to get evaluated from generation 1, ignoring generation 0. Now this is properly handled and callbacks work from generation 0.

API Changes:

The modules plots and MLflowConfig now requires an explicit installation of seaborn and mlflow, now those are optionally installed using pip install sklearn-genetic-opt[all].
The GASearchCV.logbook property now has extra information that comes from the scikit-learn cross_validate function.
An optional extra parameter was added to GASearchCV, named return_train_score: bool, default= False. As in scikit-learn, it controls if the cv_results_ should have the training scores.

Docs:

Edited all demos to be in the jupyter notebook format.
Added embedded jupyter notebooks examples.
The modules of the package now have a summary of their classes/functions in the docs.
Updated the callbacks and custom callbacks tutorials to add new TensorBoard callback and the new methods on the base callback.

Internal:

Now the hof uses the self.best_params_ for the position 0, to be consistent with the scikit-learn API and parameters like self.best_index_

What’s new in 0.5.0

Features:

Build-in integration with MLflow using the MLflowConfig and the new parameter log_config from GASearchCV
Implemented the callback LogbookSaver which saves the estimator.logbook object with all the fitted hyperparameters and their cross-validation score
Added the parameter estimator to all the functions on the module callbacks

Docs:

Added user guide “Integrating with MLflow”
Update the tutorial “Custom Callbacks” for new API inheritance behavior

Internal:

Added a base class BaseCallback from which all Callbacks must inherit from
Now coverage report doesn’t take into account the lines with # pragma: no cover and # noqa

What’s new in 0.4.1

Docs:

Added user guide on “Understanding the evaluation process”
Several guides on contributing, code of conduct
Added important links
Docs requirements are now independent of package requirements

Internal:

Changed test ci from travis to Github actions

What’s new in 0.4

Features:

Implemented callbacks module to stop the optimization process based in the current iteration metrics, currently implemented: ThresholdStopping , ConsecutiveStopping and DeltaThreshold.
The algorithms ‘eaSimple’, ‘eaMuPlusLambda’, ‘eaMuCommaLambda’ are now implemented in the module algorithms for more control over their options, rather that taking the deap.algorithms module
Implemented the plots module and added the function plot_search_space(), this function plots a mixed counter, scatter and histogram plots over all the fitted hyperparameters and their cross-validation score
Documentation based in rst with Sphinx to host in read the docs. It includes public classes and functions documentation as well as several tutorials on how to use the package
Added best_params_ and best_estimator_ properties after fitting GASearchCV
Added optional parameters refit, pre_dispatch and error_score

API Changes:

Removed support for python 3.6, changed the libraries supported versions to be the same as scikit-learn current version
Several internal changes on the documentation and variables naming style to be compatible with Sphinx
Removed the parameters continuous_parameters, categorical_parameters and integer_parameters replacing them with param_grid

What’s new in 0.3

Features:

Added the space module to control better the data types and ranges of each hyperparameter, their distribution to sample random values from, and merge all data types in one Space class that can work with the new param_grid parameter
Changed the continuous_parameters, categorical_parameters and integer_parameters for the param_grid, the first ones still work but will be removed in a next version
Added the option to use the eaMuCommaLambda algorithm from deap
The mu and lambda_ parameters of the internal eaMuPlusLambda and eaMuCommaLambda now are in terms of the initial population size and not the number of generations

What’s new in 0.2

Features:

Enabled deap’s eaMuPlusLambda algorithm for the optimization process, now is the default routine
Added a logbook and history properties to the fitted GASearchCV to make post-fit analysis
Elitism=False now implements a roulette selection instead of ignoring the parameter
Added the parameter keep_top_k to control the number of solutions if the hall of fame (hof)

API Changes:

Refactored the optimization algorithm to use DEAP package instead of a custom implementation, this causes the removal of several methods, properties and variables inside the GASearchCV class
The parameter encoding_length has been removed, it’s no longer required to the GASearchCV class
Renamed the property of the fitted estimator from best_params_ to best_params
The verbosity now prints the deap log of the fitness function, it’s standard deviation, max and min values from each generation
The variable GASearchCV._best_solutions was removed and it’s meant to be replaced with GASearchCV.logbook and GASearchCV.history
Changed default parameters crossover_probability from 1 to 0.8 and generations from 50 to 40

What’s new in 0.1

Features:

GASearchCV for hyperparameters tuning using custom genetic algorithm for scikit-learn classification and regression models
plot_fitness_evolution() function to see the average fitness values over generations