Understanding the Evaluation Process
====================================

This tutorial explains how :class:`~sklearn_genetic.GASearchCV` evaluates
candidate hyperparameters and how cross-validation fits into the evolutionary
search process.

Two parameters control most of the evaluation behavior:

``cv``
    The cross-validation strategy. This can be an integer or any compatible
    scikit-learn cross-validator, such as
    :class:`~sklearn.model_selection.KFold`,
    :class:`~sklearn.model_selection.StratifiedKFold`, or
    :class:`~sklearn.model_selection.RepeatedKFold`. See the
    `scikit-learn cross-validation documentation <https://scikit-learn.org/stable/modules/cross_validation.html>`__
    for more details.

``scoring``
    The metric used to evaluate each candidate. For classification, common
    choices include ``"accuracy"``, ``"precision"``, and ``"recall"``. For
    regression, common choices include ``"r2"``, ``"max_error"``, and
    ``"neg_root_mean_squared_error"``. The full list is available in the
    `scikit-learn model evaluation documentation <https://scikit-learn.org/stable/modules/model_evaluation.html>`__.

Evolutionary Algorithm Background
---------------------------------

A genetic algorithm is a metaheuristic optimization method inspired by natural
selection. In sklearn-genetic-opt, the algorithm searches over possible
hyperparameter configurations and uses their cross-validation scores as the
fitness signal.

The main concepts are:

- **Individual:** one candidate solution, such as one set of hyperparameters.
- **Population:** a group of individuals evaluated in the same generation.
- **Generation:** one iteration of the evolutionary process.
- **Fitness value:** the score used to compare individuals, usually a
  cross-validation score.
- **Genetic operators:** operations such as selection, crossover, mutation, and
  elitism that create the next generation.

At a high level, the process is:

1. Build an initial population from the search space. This is generation 0.
2. Evaluate each individual with cross-validation.
3. Use genetic operators to create a new generation.
4. Repeat the evaluation and generation steps until the search reaches its
   generation limit or a callback stops it.

Creating the First Generation
-----------------------------

By default, the first generation is built with
``PopulationConfig(initializer="smart")``. For :class:`~sklearn_genetic.GASearchCV`,
this combines valid warm-start candidates, valid estimator defaults, Latin
hypercube samples for numeric hyperparameters, stratified categorical values,
and duplicate avoidance. For :class:`~sklearn_genetic.GAFeatureSelectionCV`, it
creates duplicate-aware feature masks with varied selected-feature counts. Set
``PopulationConfig(initializer="random")`` to use fully random initialization.

Each individual can be represented as a chromosome-like structure. In the
example below, the first generation contains three individuals. Each chromosome
encodes one candidate set of hyperparameters:

.. image:: ../images/understandcv_generation0.png

The red arrow represents the encoding step, where hyperparameter values are
mapped into a chromosome representation. Each block is a gene, and groups of
genes represent hyperparameters. The purple arrow represents scoring: each
candidate is decoded, evaluated with cross-validation, and assigned a fitness
value.

Creating New Generations
------------------------

After the initial population is evaluated, the algorithm creates a new
generation. The exact process depends on the selected
:mod:`~sklearn_genetic.algorithms` strategy, but the most common operations are
crossover, mutation, selection, and elitism.

Crossover
^^^^^^^^^

Crossover combines information from two parent chromosomes to create new
children. Parent selection usually favors individuals with better fitness, so
stronger candidates have a higher chance of contributing to the next generation.

For example, if individuals 1 and 3 are selected as parents, the algorithm can
split their chromosomes and exchange sections:

.. image:: ../images/understandcv_crossover.png

After decoding the child chromosomes, the resulting candidates might look like
this:

.. code:: bash

    Child 1: {"learning_rate": 0.015, "layers": 4, "optimizer": "Adam"}
    Child 2: {"learning_rate": 0.4, "layers": 6, "optimizer": "SGD"}

Mutation
^^^^^^^^

Crossover alone can make the search converge too quickly around similar
solutions. Mutation introduces diversity by randomly changing part of a
chromosome. It can alter a single gene or an entire hyperparameter value.

For example, a single gene in a child chromosome can change:

.. image:: ../images/understandcv_mutantchild.png

Or the mutation can change a complete hyperparameter, such as the optimizer:

.. image:: ../images/understandcv_mutantparameter.png

Elitism
^^^^^^^

Elitism keeps the best individuals from one generation and copies them into the
next generation. This helps preserve strong candidates while the rest of the
population continues exploring.

After crossover, mutation, selection, and elitism, a new generation may look
like this:

.. image:: ../images/understandcv_generation1.png

The search repeats this cycle until one of the stopping conditions is met:

- The maximum number of generations is reached.
- The search exceeds a time budget.
- An early-stopping callback detects that the score has reached a threshold or
  stopped improving.

How GASearchCV Evaluates Candidates
-----------------------------------

In sklearn-genetic-opt, :class:`~sklearn_genetic.GASearchCV` evaluates
candidate hyperparameters as follows:

1. Sample ``population_size`` candidate configurations from ``param_grid``.
2. Fit and score one estimator for each candidate using the configured ``cv``
   and ``scoring`` values.
3. Log generation-level metrics when ``verbose=True``.
4. Create the next generation using the selected evolutionary algorithm.
5. Repeat until ``generations`` is reached or callbacks stop the search.
6. Select the best hyperparameters based on the best individual
   cross-validation score.

If ``use_cache=True`` (the default), candidates that have already been evaluated
reuse their stored fitness values. Duplicate candidates inside the same
generation are also evaluated only once and then recorded for each occurrence.
When ``n_jobs`` enables parallel execution, unique candidates in a generation
are evaluated in parallel, while each candidate's own cross-validation runs
sequentially to avoid nested parallelism. Set ``RuntimeConfig(parallel_backend="cv")`` to keep
candidate evaluation serial and pass ``n_jobs`` to each candidate's
cross-validation instead. After fitting, ``fit_stats_`` exposes counters for
actual cross-validation calls, cache hits, duplicate candidates, skipped invalid
candidates, and population-level parallel batches.

The ``history`` attribute also includes optimizer telemetry for each generation:
``population_size``, ``unique_individuals``, ``unique_individual_ratio``,
``genotype_diversity``, ``fitness_improvement``, ``fitness_improved``,
``stagnation_generations``, ``best_generation``, ``mutation_probability``,
``diversity_control_triggered``, ``random_immigrants``,
``duplicate_replacements``, ``local_refinements``,
``fitness_sharing_applied``, ``mean_niche_count``, and
``max_niche_count``. These fields help diagnose whether the search is still
exploring diverse solutions or has started to converge/stagnate around the same
candidates.

When the search space is noisy or rugged, ``OptimizationConfig(diversity_control=True)`` can help
avoid premature convergence by increasing mutation, replacing duplicate
candidates, and adding random immigrants after low-diversity or stagnant
generations. When the search has found promising regions,
``OptimizationConfig(local_search=True)`` can run a short neighborhood refinement around the
hall-of-fame candidates without increasing the number of GA generations.
``OptimizationConfig(fitness_sharing=True)`` can reduce selection pressure on crowded niches, so
similar high-scoring candidates do not immediately dominate the population.

The generation log contains summary metrics:

``fitness``
    The average score across the individuals in the current generation.

``fitness_std``
    The standard deviation of the individual scores in the current generation.

``fitness_best``
    The best score found so far. This is the most useful metric for convergence
    plots and early-stopping callbacks because it is cumulative.

``fitness_max``
    The best individual score in the current generation.

``fitness_min``
    The worst individual score in the current generation.

Except for ``fitness_best``, these values summarize the current population, not
just the final selected model. For example, if ``EvolutionConfig(population_size=10)``, the
``fitness`` value is the average score of the 10 candidates evaluated in that
generation.

The complete flow can be represented like this:

.. image:: ../images/genetic_cv.png

Each candidate is evaluated with cross-validation. For example, a 5-fold
strategy splits the data into five train/validation rotations:

.. image:: ../images/k-folds.png

Image taken from
`scikit-learn <https://scikit-learn.org/stable/modules/cross_validation.html>`__.

Example
-------

This example tunes a :class:`~sklearn.tree.DecisionTreeRegressor` inside a
scikit-learn :class:`~sklearn.pipeline.Pipeline` on the diabetes regression
dataset. The search uses 5-fold cross-validation and optimizes the ``"r2"``
metric.

At the end, we print the best hyperparameters and the R-squared score on the
test set.

.. code:: python3

    from sklearn.datasets import load_diabetes
    from sklearn.metrics import r2_score
    from sklearn.model_selection import KFold, train_test_split
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import StandardScaler
    from sklearn.tree import DecisionTreeRegressor

    from sklearn_genetic import EvolutionConfig, GASearchCV, PopulationConfig, RuntimeConfig
    from sklearn_genetic.space import Categorical, Continuous, Integer

    data = load_diabetes()
    X, y = data["data"], data["target"]

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.33, random_state=42
    )

    cv = KFold(n_splits=5, shuffle=True, random_state=42)

    pipe = Pipeline(
        [
            ("scaler", StandardScaler()),
            ("clf", DecisionTreeRegressor(random_state=42)),
        ]
    )

    param_grid = {
        "clf__ccp_alpha": Continuous(0, 1),
        "clf__criterion": Categorical(["squared_error", "absolute_error"]),
        "clf__max_depth": Integer(2, 20),
        "clf__min_samples_split": Integer(2, 30),
    }

    evolved_estimator = GASearchCV(
        estimator=pipe,
        cv=cv,
        scoring="r2",
        param_grid=param_grid,
        evolution_config=EvolutionConfig(
            population_size=15,
            generations=20,
            tournament_size=3,
            elitism=True,
            keep_top_k=4,
            crossover_probability=0.9,
            mutation_probability=0.05,
            criteria="max",
            algorithm="eaMuCommaLambda",
        ),
        population_config=PopulationConfig(initializer="smart"),
        runtime_config=RuntimeConfig(n_jobs=-1),
    )

    evolved_estimator.fit(X_train, y_train)

    y_predict_ga = evolved_estimator.predict(X_test)
    r_squared = r2_score(y_test, y_predict_ga)

    print(evolved_estimator.best_params_)
    print("R-squared:", "{:.2f}".format(r_squared))