MLflow 3 Tracking for GASearchCV

This notebook shows how to log a sklearn-genetic-opt hyperparameter search with MLflow 3. It combines the library’s MLflowConfig integration, which logs each candidate as a nested run, with MLflow 3 tracking features such as dataset inputs, logged models, model tags, and searchable run/model metadata.

What Gets Logged

The notebook uses two complementary MLflow logging layers:

  • MLflowConfig logs each evaluated candidate as a nested run with its parameter values and cross-validation score.

  • A parent run logs the dataset input, optimizer settings, final holdout metrics, fit_stats_, the best parameters, and the final refitted model.

This layout keeps low-level candidate history available without losing the high-level summary of the search.

Problem Setup

We use the breast cancer dataset and tune a random forest. The dataset is small enough for a notebook, but it is realistic enough to demonstrate classification metrics and model tracking.

[1]:
from pprint import pprint

import warnings
import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score, roc_auc_score
from sklearn.model_selection import StratifiedKFold, train_test_split

from sklearn_genetic import (
    EvolutionConfig,
    GASearchCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.mlflow_log import MLflowConfig
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Continuous, Integer


warnings.filterwarnings('ignore', category=UserWarning)

RANDOM_STATE = 42
TRACKING_URI = "sqlite:///mlflow3_tracking.db"
EXPERIMENT_NAME = "sklearn-genetic-opt-mlflow3"
[2]:
data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target.rename("target")

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.30,
    stratify=y,
    random_state=RANDOM_STATE,
)

cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)

print(f"Training shape: {X_train.shape}")
print(f"Test shape: {X_test.shape}")
print(f"Tracking URI: {TRACKING_URI}")
Training shape: (398, 30)
Test shape: (171, 30)
Tracking URI: sqlite:///mlflow3_tracking.db

Create a Local MLflow Experiment

For a local tutorial, a SQLite tracking URI is easier than requiring an MLflow server and supports current MLflow 3 tracking features. The same code works with a remote tracking server by changing TRACKING_URI.

MLflow 3 datasets can be logged with mlflow.data.from_pandas and mlflow.log_input. This records the dataset context used by the parent run.

[3]:
mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment(EXPERIMENT_NAME)

train_dataset = mlflow.data.from_pandas(
    pd.concat([X_train, y_train], axis=1),
    targets="target",
    name="breast-cancer-train",
)
test_dataset = mlflow.data.from_pandas(
    pd.concat([X_test, y_test], axis=1),
    targets="target",
    name="breast-cancer-test",
)

Run the Search Inside a Parent MLflow Run

The parent run records the overall experiment. Nested candidate runs are created automatically by MLflowConfig during search.fit.

MLflow 3 model tracking is represented here in two ways:

  • mlflow.initialize_logged_model creates a logged-model record before the fit starts.

  • mlflow.sklearn.log_model(..., name=..., model_id=...) logs the final refitted estimator and links it to that model record.

[5]:
callbacks = [
    DeltaThreshold(threshold=0.0005, generations=5, metric="fitness_best"),
    ConsecutiveStopping(generations=7, metric="fitness_best"),
    TimerStopping(total_seconds=120),
]

with mlflow.start_run(run_name="ga-random-forest-search") as parent_run:
    mlflow.set_tags(
        {
            "project": "sklearn-genetic-opt",
            "mlflow_version": mlflow.__version__,
            "run_level": "parent",
            "optimizer": "GASearchCV",
        }
    )
    mlflow.log_input(train_dataset, context="training")
    mlflow.log_input(test_dataset, context="holdout")
    mlflow.log_params(
        {
            "population_size": search.population_size,
            "generations": search.generations,
            "population_initializer": search.population_initializer,
            "parallel_backend": search.parallel_backend,
            "local_search": search.local_search,
            "diversity_control": search.diversity_control,
            "fitness_sharing": search.fitness_sharing,
        }
    )

    logged_model = mlflow.initialize_logged_model(
        name="ga-random-forest-best-model",
        source_run_id=parent_run.info.run_id,
        model_type="classifier",
        tags={"stage": "candidate", "owner": "sklearn-genetic-opt"},
    )

    search.fit(X_train, y_train, callbacks=callbacks)

    probabilities = search.predict_proba(X_test)[:, 1]
    predictions = search.predict(X_test)
    holdout_metrics = {
        "holdout_accuracy": accuracy_score(y_test, predictions),
        "holdout_balanced_accuracy": balanced_accuracy_score(y_test, predictions),
        "holdout_roc_auc": roc_auc_score(y_test, probabilities),
    }

    mlflow.log_metrics(holdout_metrics)
    mlflow.log_metric("best_cv_roc_auc", search.best_score_)
    mlflow.log_params({f"best__{key}": value for key, value in search.best_params_.items()})
    mlflow.log_metrics(
        {
            f"fit_stats_{key}": value
            for key, value in search.fit_stats_.items()
            if isinstance(value, (int, float))
        }
    )

    mlflow.sklearn.log_model(
        sk_model=search.best_estimator_,
        name="best_estimator",
        model_id=logged_model.model_id,
        input_example=X_test.head(5),
        params=search.best_params_,
        tags={"optimizer": "GASearchCV", "dataset": "breast_cancer"},
        model_type="classifier",
    )
    mlflow.set_logged_model_tags(
        logged_model.model_id,
        {
            "stage": "validated",
            "best_cv_roc_auc": f"{search.best_score_:.4f}",
            "holdout_roc_auc": f"{holdout_metrics['holdout_roc_auc']:.4f}",
        },
    )
    mlflow.finalize_logged_model(logged_model.model_id, status="READY")

parent_run_id = parent_run.info.run_id
logged_model_id = logged_model.model_id
holdout_metrics
 gen evals           avg          best     div  unique  stag     mut   sel             events
---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------
   0    12       0.98609       0.99130   0.742   1.000     0       -     - -
   1    24       0.98509       0.99130   0.394   0.750     1   0.200     3 share
   2    24       0.98507       0.99130   0.394   0.667     2   0.216     3 dup=9,share
   3    24       0.98486       0.99130   0.242   0.583     3   0.193     3 dup=7,share
   4    24       0.98519       0.99130   0.364   0.750     4   0.319     3 div,imm=3,dup=7,sh
INFO: DeltaThreshold callback met its criteria
INFO: Stopping the algorithm
[5]:
{'holdout_accuracy': 0.9298245614035088,
 'holdout_balanced_accuracy': 0.9250876168224299,
 'holdout_roc_auc': 0.9875876168224299}

Inspect the Best Model and Metrics

The fitted search object still behaves like a sklearn estimator. The MLflow run now contains the same summary information for experiment tracking and later comparison.

[6]:
print("Parent run ID:", parent_run_id)
print("Logged model ID:", logged_model_id)
print("Best CV ROC AUC:", round(search.best_score_, 4))
print("Best parameters:")
pprint(search.best_params_)
Parent run ID: edff6735fa1b4eab8b61205969cbf748
Logged model ID: m-874be46c114a4293aed5527a9aabe7fd
Best CV ROC AUC: 0.9915
Best parameters:
{'ccp_alpha': 0.0041418922671775625,
 'max_depth': 4,
 'max_features': 'log2',
 'min_samples_leaf': 5,
 'min_samples_split': 7,
 'n_estimators': 97}
[7]:
pd.DataFrame([holdout_metrics], index=["ga_random_forest"])
[7]:
holdout_accuracy holdout_balanced_accuracy holdout_roc_auc
ga_random_forest 0.929825 0.925088 0.987588
[8]:
search.fit_stats_
[8]:
{'evaluated_candidates': 110,
 'unique_candidates': 109,
 'cross_validate_calls': 109,
 'cache_hits': 1,
 'duplicate_candidates': 0,
 'skipped_invalid_candidates': 0,
 'population_parallel_batches': 0,
 'population_serial_batches': 6,
 'random_immigrants': 3,
 'local_refinement_candidates': 2}

Search Runs and Logged Models

MLflow can query both runs and logged models. The parent run contains the summary. The nested candidate runs contain individual hyperparameter evaluations emitted by MLflowConfig.

[9]:
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)

runs = mlflow.search_runs(
    experiment_ids=[experiment.experiment_id],
    order_by=["attributes.start_time DESC"],
)

columns = [
    "run_id",
    "tags.mlflow.runName",
    "tags.run_level",
    "metrics.score",
    "metrics.best_cv_roc_auc",
    "metrics.holdout_roc_auc",
]
runs[[column for column in columns if column in runs.columns]].head(10)
[9]:
run_id tags.mlflow.runName tags.run_level metrics.score metrics.best_cv_roc_auc metrics.holdout_roc_auc
0 226f1435fb4d493d9ffdc275e8df8280 candidate-random-forest None 0.984827 NaN NaN
1 7b02b3ced1364b18bff67c23239f67ab candidate-random-forest None 0.991458 NaN NaN
2 13f275bfaf8042cd8c9c1d4408846c65 candidate-random-forest None 0.984292 NaN NaN
3 e63da585303f467cb86476eec8702bda candidate-random-forest None 0.984099 NaN NaN
4 475925e5ddfd4b9a9708d33c62e080b4 candidate-random-forest None 0.984838 NaN NaN
5 b6ac482b478d40919d0ccc4ec6c88dce candidate-random-forest None 0.984645 NaN NaN
6 24a899b7318c4723a678ab87b9356bef candidate-random-forest None 0.986772 NaN NaN
7 acd8ba068251403aaf3ad0a5155d9ec1 candidate-random-forest None 0.984651 NaN NaN
8 e86a7aaed7324dc1ad1da67b4cdb0d1b candidate-random-forest None 0.985848 NaN NaN
9 930af0563d3e4b0cbe5fd9a78789fa37 candidate-random-forest None 0.987692 NaN NaN
[10]:
logged_models = mlflow.search_logged_models(
    experiment_ids=[experiment.experiment_id],
    order_by=[{"field_name": "creation_time", "ascending": False}],
    output_format="list",
)

[(model.model_id, model.name, model.status) for model in logged_models[:5]]
[10]:
[('m-874be46c114a4293aed5527a9aabe7fd',
  'ga-random-forest-best-model',
  <LoggedModelStatus.READY: 'READY'>),
 ('m-81188119d3614150a4e11cbc425d3ec7',
  'ga-random-forest-best-model',
  <LoggedModelStatus.PENDING: 'PENDING'>)]

Open the MLflow UI

From the repository root, run the command below in a terminal and open the printed local URL. Because this notebook uses a local SQLite tracking backend, point the UI at the same database.

mlflow ui --backend-store-uri sqlite:///mlflow3_tracking.db

Practical Notes

  • Use a parent run for the overall search and nested runs for candidate-level details.

  • Log datasets with mlflow.log_input so future readers know which data context produced the model.

  • Keep save_models=False in MLflowConfig if candidate-level model artifacts are too heavy; log only the final best_estimator_ from the parent run.

  • Use logged-model tags for lifecycle metadata such as stage, validation metrics, owner, and optimizer settings.

  • For remote tracking, replace TRACKING_URI with your MLflow tracking server URI and keep the rest of the notebook unchanged.