MLflow 3 Tracking for GASearchCV

This notebook shows how to log a sklearn-genetic-opt hyperparameter search with MLflow 3. It combines the library’s MLflowConfig integration, which logs each candidate as a nested run, with MLflow 3 tracking features such as dataset inputs, logged models, model tags, and searchable run/model metadata.

What Gets Logged

The notebook uses two complementary MLflow logging layers:

MLflowConfig logs each evaluated candidate as a nested run with its parameter values and cross-validation score.
A parent run logs the dataset input, optimizer settings, final holdout metrics, fit_stats_, the best parameters, and the final refitted model.

This layout keeps low-level candidate history available without losing the high-level summary of the search.

Problem Setup

We use the breast cancer dataset and tune a random forest. The dataset is small enough for a notebook, but it is realistic enough to demonstrate classification metrics and model tracking.

[1]:

from pprint import pprint

import warnings
import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score, roc_auc_score
from sklearn.model_selection import StratifiedKFold, train_test_split

from sklearn_genetic import (
    EvolutionConfig,
    GASearchCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.mlflow_log import MLflowConfig
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Continuous, Integer


warnings.filterwarnings('ignore', category=UserWarning)

RANDOM_STATE = 42
TRACKING_URI = "sqlite:///mlflow3_tracking.db"
EXPERIMENT_NAME = "sklearn-genetic-opt-mlflow3"

[2]:

data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target.rename("target")

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.30,
    stratify=y,
    random_state=RANDOM_STATE,
)

cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)

print(f"Training shape: {X_train.shape}")
print(f"Test shape: {X_test.shape}")
print(f"Tracking URI: {TRACKING_URI}")

Training shape: (398, 30)
Test shape: (171, 30)
Tracking URI: sqlite:///mlflow3_tracking.db

Create a Local MLflow Experiment

For a local tutorial, a SQLite tracking URI is easier than requiring an MLflow server and supports current MLflow 3 tracking features. The same code works with a remote tracking server by changing TRACKING_URI.

MLflow 3 datasets can be logged with mlflow.data.from_pandas and mlflow.log_input. This records the dataset context used by the parent run.

[3]:

mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment(EXPERIMENT_NAME)

train_dataset = mlflow.data.from_pandas(
    pd.concat([X_train, y_train], axis=1),
    targets="target",
    name="breast-cancer-train",
)
test_dataset = mlflow.data.from_pandas(
    pd.concat([X_test, y_test], axis=1),
    targets="target",
    name="breast-cancer-test",
)

Configure the Genetic Search

The search uses optimizer controls that are useful for experiment tracking:

PopulationConfig(initializer="smart") for a better initial population.
warm_start_configs to seed one known reasonable configuration.
adaptive crossover and mutation schedules.
diversity control, random immigrants, fitness sharing, and local search.
RuntimeConfig(parallel_backend="auto") and use_cache=True for faster evaluation mechanics.

MLflowConfig is attached through log_config; every candidate evaluation becomes a nested MLflow run.

[4]:

param_grid = {
    "n_estimators": Integer(40, 120),
    "max_depth": Integer(2, 10),
    "min_samples_split": Integer(2, 12),
    "min_samples_leaf": Integer(1, 8),
    "max_features": Categorical(["sqrt", "log2", None]),
    "ccp_alpha": Continuous(0.0, 0.03),
}

mlflow_config = MLflowConfig(
    tracking_uri=TRACKING_URI,
    experiment=EXPERIMENT_NAME,
    run_name="candidate-random-forest",
    save_models=False,
)

search = GASearchCV(
    estimator=RandomForestClassifier(random_state=RANDOM_STATE, n_jobs=1),
    param_grid=param_grid,
    scoring="roc_auc",
    cv=cv,
    evolution_config=EvolutionConfig(
        population_size=12,
        generations=8,
        crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),
        mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),
        tournament_size=3,
        elitism=True,
        keep_top_k=3,
    ),
    population_config=PopulationConfig(
        initializer="smart",
        warm_start_configs=[
            {
                "n_estimators": 80,
                "max_depth": 6,
                "min_samples_split": 4,
                "min_samples_leaf": 2,
                "max_features": "sqrt",
                "ccp_alpha": 0.0,
            }
        ],
    ),
    runtime_config=RuntimeConfig(
        n_jobs=-1,
        parallel_backend="auto",
        use_cache=True,
        verbose=True,
        return_train_score=False,
    ),
    optimization_config=OptimizationConfig(
        local_search=True,
        local_search_top_k=2,
        local_search_steps=1,
        local_search_radius=0.20,
        diversity_control=True,
        diversity_threshold=0.30,
        diversity_stagnation_generations=3,
        diversity_mutation_boost=1.8,
        random_immigrants_fraction=0.10,
        fitness_sharing=True,
        sharing_radius=0.40,
    ),
    log_config=mlflow_config,
)

Run the Search Inside a Parent MLflow Run

The parent run records the overall experiment. Nested candidate runs are created automatically by MLflowConfig during search.fit.

MLflow 3 model tracking is represented here in two ways:

mlflow.initialize_logged_model creates a logged-model record before the fit starts.
mlflow.sklearn.log_model(..., name=..., model_id=...) logs the final refitted estimator and links it to that model record.

[5]:

callbacks = [
    DeltaThreshold(threshold=0.0005, generations=5, metric="fitness_best"),
    ConsecutiveStopping(generations=7, metric="fitness_best"),
    TimerStopping(total_seconds=120),
]

with mlflow.start_run(run_name="ga-random-forest-search") as parent_run:
    mlflow.set_tags(
        {
            "project": "sklearn-genetic-opt",
            "mlflow_version": mlflow.__version__,
            "run_level": "parent",
            "optimizer": "GASearchCV",
        }
    )
    mlflow.log_input(train_dataset, context="training")
    mlflow.log_input(test_dataset, context="holdout")
    mlflow.log_params(
        {
            "population_size": search.population_size,
            "generations": search.generations,
            "population_initializer": search.population_initializer,
            "parallel_backend": search.parallel_backend,
            "local_search": search.local_search,
            "diversity_control": search.diversity_control,
            "fitness_sharing": search.fitness_sharing,
        }
    )

    logged_model = mlflow.initialize_logged_model(
        name="ga-random-forest-best-model",
        source_run_id=parent_run.info.run_id,
        model_type="classifier",
        tags={"stage": "candidate", "owner": "sklearn-genetic-opt"},
    )

    search.fit(X_train, y_train, callbacks=callbacks)

    probabilities = search.predict_proba(X_test)[:, 1]
    predictions = search.predict(X_test)
    holdout_metrics = {
        "holdout_accuracy": accuracy_score(y_test, predictions),
        "holdout_balanced_accuracy": balanced_accuracy_score(y_test, predictions),
        "holdout_roc_auc": roc_auc_score(y_test, probabilities),
    }

    mlflow.log_metrics(holdout_metrics)
    mlflow.log_metric("best_cv_roc_auc", search.best_score_)
    mlflow.log_params({f"best__{key}": value for key, value in search.best_params_.items()})
    mlflow.log_metrics(
        {
            f"fit_stats_{key}": value
            for key, value in search.fit_stats_.items()
            if isinstance(value, (int, float))
        }
    )

    mlflow.sklearn.log_model(
        sk_model=search.best_estimator_,
        name="best_estimator",
        model_id=logged_model.model_id,
        input_example=X_test.head(5),
        params=search.best_params_,
        tags={"optimizer": "GASearchCV", "dataset": "breast_cancer"},
        model_type="classifier",
    )
    mlflow.set_logged_model_tags(
        logged_model.model_id,
        {
            "stage": "validated",
            "best_cv_roc_auc": f"{search.best_score_:.4f}",
            "holdout_roc_auc": f"{holdout_metrics['holdout_roc_auc']:.4f}",
        },
    )
    mlflow.finalize_logged_model(logged_model.model_id, status="READY")

parent_run_id = parent_run.info.run_id
logged_model_id = logged_model.model_id
holdout_metrics

 gen evals           avg          best     div  unique  stag     mut   sel             events
---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------
   0    12       0.98609       0.99130   0.742   1.000     0       -     - -
   1    24       0.98509       0.99130   0.394   0.750     1   0.200     3 share
   2    24       0.98507       0.99130   0.394   0.667     2   0.216     3 dup=9,share
   3    24       0.98486       0.99130   0.242   0.583     3   0.193     3 dup=7,share
   4    24       0.98519       0.99130   0.364   0.750     4   0.319     3 div,imm=3,dup=7,sh
INFO: DeltaThreshold callback met its criteria
INFO: Stopping the algorithm

[5]:

{'holdout_accuracy': 0.9298245614035088,
 'holdout_balanced_accuracy': 0.9250876168224299,
 'holdout_roc_auc': 0.9875876168224299}

Inspect the Best Model and Metrics

The fitted search object still behaves like a sklearn estimator. The MLflow run now contains the same summary information for experiment tracking and later comparison.

[6]:

print("Parent run ID:", parent_run_id)
print("Logged model ID:", logged_model_id)
print("Best CV ROC AUC:", round(search.best_score_, 4))
print("Best parameters:")
pprint(search.best_params_)

Parent run ID: edff6735fa1b4eab8b61205969cbf748
Logged model ID: m-874be46c114a4293aed5527a9aabe7fd
Best CV ROC AUC: 0.9915
Best parameters:
{'ccp_alpha': 0.0041418922671775625,
 'max_depth': 4,
 'max_features': 'log2',
 'min_samples_leaf': 5,
 'min_samples_split': 7,
 'n_estimators': 97}

[7]:

pd.DataFrame([holdout_metrics], index=["ga_random_forest"])

[7]:

	holdout_accuracy	holdout_balanced_accuracy	holdout_roc_auc
ga_random_forest	0.929825	0.925088	0.987588

[8]:

search.fit_stats_

[8]:

{'evaluated_candidates': 110,
 'unique_candidates': 109,
 'cross_validate_calls': 109,
 'cache_hits': 1,
 'duplicate_candidates': 0,
 'skipped_invalid_candidates': 0,
 'population_parallel_batches': 0,
 'population_serial_batches': 6,
 'random_immigrants': 3,
 'local_refinement_candidates': 2}

Search Runs and Logged Models

MLflow can query both runs and logged models. The parent run contains the summary. The nested candidate runs contain individual hyperparameter evaluations emitted by MLflowConfig.

[9]:

experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

runs = mlflow.search_runs(
    experiment_ids=[experiment.experiment_id],
    order_by=["attributes.start_time DESC"],
)

columns = [
    "run_id",
    "tags.mlflow.runName",
    "tags.run_level",
    "metrics.score",
    "metrics.best_cv_roc_auc",
    "metrics.holdout_roc_auc",
]
runs[[column for column in columns if column in runs.columns]].head(10)

[9]:

	run_id	tags.mlflow.runName	tags.run_level	metrics.score	metrics.best_cv_roc_auc	metrics.holdout_roc_auc
0	226f1435fb4d493d9ffdc275e8df8280	candidate-random-forest	None	0.984827	NaN	NaN
1	7b02b3ced1364b18bff67c23239f67ab	candidate-random-forest	None	0.991458	NaN	NaN
2	13f275bfaf8042cd8c9c1d4408846c65	candidate-random-forest	None	0.984292	NaN	NaN
3	e63da585303f467cb86476eec8702bda	candidate-random-forest	None	0.984099	NaN	NaN
4	475925e5ddfd4b9a9708d33c62e080b4	candidate-random-forest	None	0.984838	NaN	NaN
5	b6ac482b478d40919d0ccc4ec6c88dce	candidate-random-forest	None	0.984645	NaN	NaN
6	24a899b7318c4723a678ab87b9356bef	candidate-random-forest	None	0.986772	NaN	NaN
7	acd8ba068251403aaf3ad0a5155d9ec1	candidate-random-forest	None	0.984651	NaN	NaN
8	e86a7aaed7324dc1ad1da67b4cdb0d1b	candidate-random-forest	None	0.985848	NaN	NaN
9	930af0563d3e4b0cbe5fd9a78789fa37	candidate-random-forest	None	0.987692	NaN	NaN

[10]:

logged_models = mlflow.search_logged_models(
    experiment_ids=[experiment.experiment_id],
    order_by=[{"field_name": "creation_time", "ascending": False}],
    output_format="list",
)

[(model.model_id, model.name, model.status) for model in logged_models[:5]]

[10]:

[('m-874be46c114a4293aed5527a9aabe7fd',
  'ga-random-forest-best-model',
  <LoggedModelStatus.READY: 'READY'>),
 ('m-81188119d3614150a4e11cbc425d3ec7',
  'ga-random-forest-best-model',
  <LoggedModelStatus.PENDING: 'PENDING'>)]

Open the MLflow UI

From the repository root, run the command below in a terminal and open the printed local URL. Because this notebook uses a local SQLite tracking backend, point the UI at the same database.

mlflow ui --backend-store-uri sqlite:///mlflow3_tracking.db

Practical Notes

Use a parent run for the overall search and nested runs for candidate-level details.
Log datasets with mlflow.log_input so future readers know which data context produced the model.
Keep save_models=False in MLflowConfig if candidate-level model artifacts are too heavy; log only the final best_estimator_ from the parent run.
Use logged-model tags for lifecycle metadata such as stage, validation metrics, owner, and optimizer settings.
For remote tracking, replace TRACKING_URI with your MLflow tracking server URI and keep the rest of the notebook unchanged.