MLflow 3 Tracking for GASearchCV
This notebook shows how to log a sklearn-genetic-opt hyperparameter search with MLflow 3. It combines the library’s MLflowConfig integration, which logs each candidate as a nested run, with MLflow 3 tracking features such as dataset inputs, logged models, model tags, and searchable run/model metadata.
Menu
What Gets Logged
The notebook uses two complementary MLflow logging layers:
MLflowConfiglogs each evaluated candidate as a nested run with its parameter values and cross-validation score.A parent run logs the dataset input, optimizer settings, final holdout metrics,
fit_stats_, the best parameters, and the final refitted model.
This layout keeps low-level candidate history available without losing the high-level summary of the search.
Problem Setup
We use the breast cancer dataset and tune a random forest. The dataset is small enough for a notebook, but it is realistic enough to demonstrate classification metrics and model tracking.
[1]:
from pprint import pprint
import warnings
import mlflow
import mlflow.sklearn
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score, roc_auc_score
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn_genetic import (
EvolutionConfig,
GASearchCV,
OptimizationConfig,
PopulationConfig,
RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.mlflow_log import MLflowConfig
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Continuous, Integer
warnings.filterwarnings('ignore', category=UserWarning)
RANDOM_STATE = 42
TRACKING_URI = "sqlite:///mlflow3_tracking.db"
EXPERIMENT_NAME = "sklearn-genetic-opt-mlflow3"
[2]:
data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target.rename("target")
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.30,
stratify=y,
random_state=RANDOM_STATE,
)
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)
print(f"Training shape: {X_train.shape}")
print(f"Test shape: {X_test.shape}")
print(f"Tracking URI: {TRACKING_URI}")
Training shape: (398, 30)
Test shape: (171, 30)
Tracking URI: sqlite:///mlflow3_tracking.db
Create a Local MLflow Experiment
For a local tutorial, a SQLite tracking URI is easier than requiring an MLflow server and supports current MLflow 3 tracking features. The same code works with a remote tracking server by changing TRACKING_URI.
MLflow 3 datasets can be logged with mlflow.data.from_pandas and mlflow.log_input. This records the dataset context used by the parent run.
[3]:
mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment(EXPERIMENT_NAME)
train_dataset = mlflow.data.from_pandas(
pd.concat([X_train, y_train], axis=1),
targets="target",
name="breast-cancer-train",
)
test_dataset = mlflow.data.from_pandas(
pd.concat([X_test, y_test], axis=1),
targets="target",
name="breast-cancer-test",
)
Configure the Genetic Search
The search uses optimizer controls that are useful for experiment tracking:
PopulationConfig(initializer="smart")for a better initial population.warm_start_configsto seed one known reasonable configuration.adaptive crossover and mutation schedules.
diversity control, random immigrants, fitness sharing, and local search.
RuntimeConfig(parallel_backend="auto")anduse_cache=Truefor faster evaluation mechanics.
MLflowConfig is attached through log_config; every candidate evaluation becomes a nested MLflow run.
[4]:
param_grid = {
"n_estimators": Integer(40, 120),
"max_depth": Integer(2, 10),
"min_samples_split": Integer(2, 12),
"min_samples_leaf": Integer(1, 8),
"max_features": Categorical(["sqrt", "log2", None]),
"ccp_alpha": Continuous(0.0, 0.03),
}
mlflow_config = MLflowConfig(
tracking_uri=TRACKING_URI,
experiment=EXPERIMENT_NAME,
run_name="candidate-random-forest",
save_models=False,
)
search = GASearchCV(
estimator=RandomForestClassifier(random_state=RANDOM_STATE, n_jobs=1),
param_grid=param_grid,
scoring="roc_auc",
cv=cv,
evolution_config=EvolutionConfig(
population_size=12,
generations=8,
crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),
mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),
tournament_size=3,
elitism=True,
keep_top_k=3,
),
population_config=PopulationConfig(
initializer="smart",
warm_start_configs=[
{
"n_estimators": 80,
"max_depth": 6,
"min_samples_split": 4,
"min_samples_leaf": 2,
"max_features": "sqrt",
"ccp_alpha": 0.0,
}
],
),
runtime_config=RuntimeConfig(
n_jobs=-1,
parallel_backend="auto",
use_cache=True,
verbose=True,
return_train_score=False,
),
optimization_config=OptimizationConfig(
local_search=True,
local_search_top_k=2,
local_search_steps=1,
local_search_radius=0.20,
diversity_control=True,
diversity_threshold=0.30,
diversity_stagnation_generations=3,
diversity_mutation_boost=1.8,
random_immigrants_fraction=0.10,
fitness_sharing=True,
sharing_radius=0.40,
),
log_config=mlflow_config,
)
Run the Search Inside a Parent MLflow Run
The parent run records the overall experiment. Nested candidate runs are created automatically by MLflowConfig during search.fit.
MLflow 3 model tracking is represented here in two ways:
mlflow.initialize_logged_modelcreates a logged-model record before the fit starts.mlflow.sklearn.log_model(..., name=..., model_id=...)logs the final refitted estimator and links it to that model record.
[5]:
callbacks = [
DeltaThreshold(threshold=0.0005, generations=5, metric="fitness_best"),
ConsecutiveStopping(generations=7, metric="fitness_best"),
TimerStopping(total_seconds=120),
]
with mlflow.start_run(run_name="ga-random-forest-search") as parent_run:
mlflow.set_tags(
{
"project": "sklearn-genetic-opt",
"mlflow_version": mlflow.__version__,
"run_level": "parent",
"optimizer": "GASearchCV",
}
)
mlflow.log_input(train_dataset, context="training")
mlflow.log_input(test_dataset, context="holdout")
mlflow.log_params(
{
"population_size": search.population_size,
"generations": search.generations,
"population_initializer": search.population_initializer,
"parallel_backend": search.parallel_backend,
"local_search": search.local_search,
"diversity_control": search.diversity_control,
"fitness_sharing": search.fitness_sharing,
}
)
logged_model = mlflow.initialize_logged_model(
name="ga-random-forest-best-model",
source_run_id=parent_run.info.run_id,
model_type="classifier",
tags={"stage": "candidate", "owner": "sklearn-genetic-opt"},
)
search.fit(X_train, y_train, callbacks=callbacks)
probabilities = search.predict_proba(X_test)[:, 1]
predictions = search.predict(X_test)
holdout_metrics = {
"holdout_accuracy": accuracy_score(y_test, predictions),
"holdout_balanced_accuracy": balanced_accuracy_score(y_test, predictions),
"holdout_roc_auc": roc_auc_score(y_test, probabilities),
}
mlflow.log_metrics(holdout_metrics)
mlflow.log_metric("best_cv_roc_auc", search.best_score_)
mlflow.log_params({f"best__{key}": value for key, value in search.best_params_.items()})
mlflow.log_metrics(
{
f"fit_stats_{key}": value
for key, value in search.fit_stats_.items()
if isinstance(value, (int, float))
}
)
mlflow.sklearn.log_model(
sk_model=search.best_estimator_,
name="best_estimator",
model_id=logged_model.model_id,
input_example=X_test.head(5),
params=search.best_params_,
tags={"optimizer": "GASearchCV", "dataset": "breast_cancer"},
model_type="classifier",
)
mlflow.set_logged_model_tags(
logged_model.model_id,
{
"stage": "validated",
"best_cv_roc_auc": f"{search.best_score_:.4f}",
"holdout_roc_auc": f"{holdout_metrics['holdout_roc_auc']:.4f}",
},
)
mlflow.finalize_logged_model(logged_model.model_id, status="READY")
parent_run_id = parent_run.info.run_id
logged_model_id = logged_model.model_id
holdout_metrics
gen evals avg best div unique stag mut sel events
---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------
0 12 0.98609 0.99130 0.742 1.000 0 - - -
1 24 0.98509 0.99130 0.394 0.750 1 0.200 3 share
2 24 0.98507 0.99130 0.394 0.667 2 0.216 3 dup=9,share
3 24 0.98486 0.99130 0.242 0.583 3 0.193 3 dup=7,share
4 24 0.98519 0.99130 0.364 0.750 4 0.319 3 div,imm=3,dup=7,sh
INFO: DeltaThreshold callback met its criteria
INFO: Stopping the algorithm
[5]:
{'holdout_accuracy': 0.9298245614035088,
'holdout_balanced_accuracy': 0.9250876168224299,
'holdout_roc_auc': 0.9875876168224299}
Inspect the Best Model and Metrics
The fitted search object still behaves like a sklearn estimator. The MLflow run now contains the same summary information for experiment tracking and later comparison.
[6]:
print("Parent run ID:", parent_run_id)
print("Logged model ID:", logged_model_id)
print("Best CV ROC AUC:", round(search.best_score_, 4))
print("Best parameters:")
pprint(search.best_params_)
Parent run ID: edff6735fa1b4eab8b61205969cbf748
Logged model ID: m-874be46c114a4293aed5527a9aabe7fd
Best CV ROC AUC: 0.9915
Best parameters:
{'ccp_alpha': 0.0041418922671775625,
'max_depth': 4,
'max_features': 'log2',
'min_samples_leaf': 5,
'min_samples_split': 7,
'n_estimators': 97}
[7]:
pd.DataFrame([holdout_metrics], index=["ga_random_forest"])
[7]:
| holdout_accuracy | holdout_balanced_accuracy | holdout_roc_auc | |
|---|---|---|---|
| ga_random_forest | 0.929825 | 0.925088 | 0.987588 |
[8]:
search.fit_stats_
[8]:
{'evaluated_candidates': 110,
'unique_candidates': 109,
'cross_validate_calls': 109,
'cache_hits': 1,
'duplicate_candidates': 0,
'skipped_invalid_candidates': 0,
'population_parallel_batches': 0,
'population_serial_batches': 6,
'random_immigrants': 3,
'local_refinement_candidates': 2}
Search Runs and Logged Models
MLflow can query both runs and logged models. The parent run contains the summary. The nested candidate runs contain individual hyperparameter evaluations emitted by MLflowConfig.
[9]:
experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)
runs = mlflow.search_runs(
experiment_ids=[experiment.experiment_id],
order_by=["attributes.start_time DESC"],
)
columns = [
"run_id",
"tags.mlflow.runName",
"tags.run_level",
"metrics.score",
"metrics.best_cv_roc_auc",
"metrics.holdout_roc_auc",
]
runs[[column for column in columns if column in runs.columns]].head(10)
[9]:
| run_id | tags.mlflow.runName | tags.run_level | metrics.score | metrics.best_cv_roc_auc | metrics.holdout_roc_auc | |
|---|---|---|---|---|---|---|
| 0 | 226f1435fb4d493d9ffdc275e8df8280 | candidate-random-forest | None | 0.984827 | NaN | NaN |
| 1 | 7b02b3ced1364b18bff67c23239f67ab | candidate-random-forest | None | 0.991458 | NaN | NaN |
| 2 | 13f275bfaf8042cd8c9c1d4408846c65 | candidate-random-forest | None | 0.984292 | NaN | NaN |
| 3 | e63da585303f467cb86476eec8702bda | candidate-random-forest | None | 0.984099 | NaN | NaN |
| 4 | 475925e5ddfd4b9a9708d33c62e080b4 | candidate-random-forest | None | 0.984838 | NaN | NaN |
| 5 | b6ac482b478d40919d0ccc4ec6c88dce | candidate-random-forest | None | 0.984645 | NaN | NaN |
| 6 | 24a899b7318c4723a678ab87b9356bef | candidate-random-forest | None | 0.986772 | NaN | NaN |
| 7 | acd8ba068251403aaf3ad0a5155d9ec1 | candidate-random-forest | None | 0.984651 | NaN | NaN |
| 8 | e86a7aaed7324dc1ad1da67b4cdb0d1b | candidate-random-forest | None | 0.985848 | NaN | NaN |
| 9 | 930af0563d3e4b0cbe5fd9a78789fa37 | candidate-random-forest | None | 0.987692 | NaN | NaN |
[10]:
logged_models = mlflow.search_logged_models(
experiment_ids=[experiment.experiment_id],
order_by=[{"field_name": "creation_time", "ascending": False}],
output_format="list",
)
[(model.model_id, model.name, model.status) for model in logged_models[:5]]
[10]:
[('m-874be46c114a4293aed5527a9aabe7fd',
'ga-random-forest-best-model',
<LoggedModelStatus.READY: 'READY'>),
('m-81188119d3614150a4e11cbc425d3ec7',
'ga-random-forest-best-model',
<LoggedModelStatus.PENDING: 'PENDING'>)]
Open the MLflow UI
From the repository root, run the command below in a terminal and open the printed local URL. Because this notebook uses a local SQLite tracking backend, point the UI at the same database.
mlflow ui --backend-store-uri sqlite:///mlflow3_tracking.db
Practical Notes
Use a parent run for the overall search and nested runs for candidate-level details.
Log datasets with
mlflow.log_inputso future readers know which data context produced the model.Keep
save_models=FalseinMLflowConfigif candidate-level model artifacts are too heavy; log only the finalbest_estimator_from the parent run.Use logged-model tags for lifecycle metadata such as
stage, validation metrics, owner, and optimizer settings.For remote tracking, replace
TRACKING_URIwith your MLflow tracking server URI and keep the rest of the notebook unchanged.