{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c1a6f224",
   "metadata": {},
   "source": [
    "# MLflow 3 Tracking for GASearchCV\n",
    "\n",
    "This notebook shows how to log a `sklearn-genetic-opt` hyperparameter search with MLflow 3. It combines the library's `MLflowConfig` integration, which logs each candidate as a nested run, with MLflow 3 tracking features such as dataset inputs, logged models, model tags, and searchable run/model metadata.\n",
    "\n",
    "## Menu\n",
    "\n",
    "1. [What Gets Logged](#what-gets-logged)\n",
    "2. [Problem Setup](#problem-setup)\n",
    "3. [Create a Local MLflow Experiment](#create-a-local-mlflow-experiment)\n",
    "4. [Configure the Genetic Search](#configure-the-genetic-search)\n",
    "5. [Run the Search Inside a Parent MLflow Run](#run-the-search-inside-a-parent-mlflow-run)\n",
    "6. [Inspect the Best Model and Metrics](#inspect-the-best-model-and-metrics)\n",
    "7. [Search Runs and Logged Models](#search-runs-and-logged-models)\n",
    "8. [Open the MLflow UI](#open-the-mlflow-ui)\n",
    "9. [Practical Notes](#practical-notes)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e635b08c",
   "metadata": {},
   "source": [
    "## What Gets Logged\n",
    "\n",
    "The notebook uses two complementary MLflow logging layers:\n",
    "\n",
    "- `MLflowConfig` logs each evaluated candidate as a nested run with its parameter values and cross-validation score.\n",
    "- A parent run logs the dataset input, optimizer settings, final holdout metrics, `fit_stats_`, the best parameters, and the final refitted model.\n",
    "\n",
    "This layout keeps low-level candidate history available without losing the high-level summary of the search."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b01ec86a",
   "metadata": {},
   "source": [
    "## Problem Setup\n",
    "\n",
    "We use the breast cancer dataset and tune a random forest. The dataset is small enough for a notebook, but it is realistic enough to demonstrate classification metrics and model tracking."
   ]
  },
  {
   "cell_type": "code",
   "id": "87e599b8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:29:33.123383Z",
     "iopub.status.busy": "2026-06-20T05:29:33.123087Z",
     "iopub.status.idle": "2026-06-20T05:30:29.111206Z",
     "shell.execute_reply": "2026-06-20T05:30:29.110208Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:51:09.178409300Z",
     "start_time": "2026-06-20T18:51:03.061847Z"
    }
   },
   "source": [
    "from pprint import pprint\n",
    "\n",
    "import warnings\n",
    "import mlflow\n",
    "import mlflow.sklearn\n",
    "import pandas as pd\n",
    "from sklearn.datasets import load_breast_cancer\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.metrics import accuracy_score, balanced_accuracy_score, roc_auc_score\n",
    "from sklearn.model_selection import StratifiedKFold, train_test_split\n",
    "\n",
    "from sklearn_genetic import (\n",
    "    EvolutionConfig,\n",
    "    GASearchCV,\n",
    "    OptimizationConfig,\n",
    "    PopulationConfig,\n",
    "    RuntimeConfig,\n",
    ")\n",
    "from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping\n",
    "from sklearn_genetic.mlflow_log import MLflowConfig\n",
    "from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter\n",
    "from sklearn_genetic.space import Categorical, Continuous, Integer\n",
    "\n",
    "\n",
    "warnings.filterwarnings('ignore', category=UserWarning)\n",
    "\n",
    "RANDOM_STATE = 42\n",
    "TRACKING_URI = \"sqlite:///mlflow3_tracking.db\"\n",
    "EXPERIMENT_NAME = \"sklearn-genetic-opt-mlflow3\""
   ],
   "outputs": [],
   "execution_count": 1
  },
  {
   "cell_type": "code",
   "id": "d0ee3a0e",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:30:29.114929Z",
     "iopub.status.busy": "2026-06-20T05:30:29.114286Z",
     "iopub.status.idle": "2026-06-20T05:30:29.978990Z",
     "shell.execute_reply": "2026-06-20T05:30:29.978049Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:51:09.242668Z",
     "start_time": "2026-06-20T18:51:09.179408400Z"
    }
   },
   "source": [
    "data = load_breast_cancer(as_frame=True)\n",
    "X = data.data\n",
    "y = data.target.rename(\"target\")\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X,\n",
    "    y,\n",
    "    test_size=0.30,\n",
    "    stratify=y,\n",
    "    random_state=RANDOM_STATE,\n",
    ")\n",
    "\n",
    "cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)\n",
    "\n",
    "print(f\"Training shape: {X_train.shape}\")\n",
    "print(f\"Test shape: {X_test.shape}\")\n",
    "print(f\"Tracking URI: {TRACKING_URI}\")"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training shape: (398, 30)\n",
      "Test shape: (171, 30)\n",
      "Tracking URI: sqlite:///mlflow3_tracking.db\n"
     ]
    }
   ],
   "execution_count": 2
  },
  {
   "cell_type": "markdown",
   "id": "ad00a22c",
   "metadata": {},
   "source": [
    "## Create a Local MLflow Experiment\n",
    "\n",
    "For a local tutorial, a SQLite tracking URI is easier than requiring an MLflow server and supports current MLflow 3 tracking features. The same code works with a remote tracking server by changing `TRACKING_URI`.\n",
    "\n",
    "MLflow 3 datasets can be logged with `mlflow.data.from_pandas` and `mlflow.log_input`. This records the dataset context used by the parent run."
   ]
  },
  {
   "cell_type": "code",
   "id": "747be655",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:30:29.981501Z",
     "iopub.status.busy": "2026-06-20T05:30:29.981275Z",
     "iopub.status.idle": "2026-06-20T05:30:53.957492Z",
     "shell.execute_reply": "2026-06-20T05:30:53.955848Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:51:10.494072900Z",
     "start_time": "2026-06-20T18:51:09.244668700Z"
    }
   },
   "source": [
    "mlflow.set_tracking_uri(TRACKING_URI)\n",
    "mlflow.set_experiment(EXPERIMENT_NAME)\n",
    "\n",
    "train_dataset = mlflow.data.from_pandas(\n",
    "    pd.concat([X_train, y_train], axis=1),\n",
    "    targets=\"target\",\n",
    "    name=\"breast-cancer-train\",\n",
    ")\n",
    "test_dataset = mlflow.data.from_pandas(\n",
    "    pd.concat([X_test, y_test], axis=1),\n",
    "    targets=\"target\",\n",
    "    name=\"breast-cancer-test\",\n",
    ")"
   ],
   "outputs": [],
   "execution_count": 3
  },
  {
   "cell_type": "markdown",
   "id": "8e187927",
   "metadata": {},
   "source": [
    "## Configure the Genetic Search\n",
    "\n",
    "The search uses optimizer controls that are useful for experiment tracking:\n",
    "\n",
    "- `PopulationConfig(initializer=\"smart\")` for a better initial population.\n",
    "- `warm_start_configs` to seed one known reasonable configuration.\n",
    "- adaptive crossover and mutation schedules.\n",
    "- diversity control, random immigrants, fitness sharing, and local search.\n",
    "- `RuntimeConfig(parallel_backend=\"auto\")` and `use_cache=True` for faster evaluation mechanics.\n",
    "\n",
    "`MLflowConfig` is attached through `log_config`; every candidate evaluation becomes a nested MLflow run."
   ]
  },
  {
   "cell_type": "code",
   "id": "31d276dc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:30:53.961204Z",
     "iopub.status.busy": "2026-06-20T05:30:53.960712Z",
     "iopub.status.idle": "2026-06-20T05:30:53.986321Z",
     "shell.execute_reply": "2026-06-20T05:30:53.984459Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:51:10.516441500Z",
     "start_time": "2026-06-20T18:51:10.495070600Z"
    }
   },
   "source": [
    "param_grid = {\n",
    "    \"n_estimators\": Integer(40, 120),\n",
    "    \"max_depth\": Integer(2, 10),\n",
    "    \"min_samples_split\": Integer(2, 12),\n",
    "    \"min_samples_leaf\": Integer(1, 8),\n",
    "    \"max_features\": Categorical([\"sqrt\", \"log2\", None]),\n",
    "    \"ccp_alpha\": Continuous(0.0, 0.03),\n",
    "}\n",
    "\n",
    "mlflow_config = MLflowConfig(\n",
    "    tracking_uri=TRACKING_URI,\n",
    "    experiment=EXPERIMENT_NAME,\n",
    "    run_name=\"candidate-random-forest\",\n",
    "    save_models=False,\n",
    ")\n",
    "\n",
    "search = GASearchCV(\n",
    "    estimator=RandomForestClassifier(random_state=RANDOM_STATE, n_jobs=1),\n",
    "    param_grid=param_grid,\n",
    "    scoring=\"roc_auc\",\n",
    "    cv=cv,\n",
    "    evolution_config=EvolutionConfig(\n",
    "        population_size=12,\n",
    "        generations=8,\n",
    "        crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),\n",
    "        mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),\n",
    "        tournament_size=3,\n",
    "        elitism=True,\n",
    "        keep_top_k=3,\n",
    "    ),\n",
    "    population_config=PopulationConfig(\n",
    "        initializer=\"smart\",\n",
    "        warm_start_configs=[\n",
    "            {\n",
    "                \"n_estimators\": 80,\n",
    "                \"max_depth\": 6,\n",
    "                \"min_samples_split\": 4,\n",
    "                \"min_samples_leaf\": 2,\n",
    "                \"max_features\": \"sqrt\",\n",
    "                \"ccp_alpha\": 0.0,\n",
    "            }\n",
    "        ],\n",
    "    ),\n",
    "    runtime_config=RuntimeConfig(\n",
    "        n_jobs=-1,\n",
    "        parallel_backend=\"auto\",\n",
    "        use_cache=True,\n",
    "        verbose=True,\n",
    "        return_train_score=False,\n",
    "    ),\n",
    "    optimization_config=OptimizationConfig(\n",
    "        local_search=True,\n",
    "        local_search_top_k=2,\n",
    "        local_search_steps=1,\n",
    "        local_search_radius=0.20,\n",
    "        diversity_control=True,\n",
    "        diversity_threshold=0.30,\n",
    "        diversity_stagnation_generations=3,\n",
    "        diversity_mutation_boost=1.8,\n",
    "        random_immigrants_fraction=0.10,\n",
    "        fitness_sharing=True,\n",
    "        sharing_radius=0.40,\n",
    "    ),\n",
    "    log_config=mlflow_config,\n",
    ")\n"
   ],
   "outputs": [],
   "execution_count": 4
  },
  {
   "cell_type": "markdown",
   "id": "360fac2f",
   "metadata": {},
   "source": [
    "## Run the Search Inside a Parent MLflow Run\n",
    "\n",
    "The parent run records the overall experiment. Nested candidate runs are created automatically by `MLflowConfig` during `search.fit`.\n",
    "\n",
    "MLflow 3 model tracking is represented here in two ways:\n",
    "\n",
    "- `mlflow.initialize_logged_model` creates a logged-model record before the fit starts.\n",
    "- `mlflow.sklearn.log_model(..., name=..., model_id=...)` logs the final refitted estimator and links it to that model record."
   ]
  },
  {
   "cell_type": "code",
   "id": "3cb8e8c4",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:30:53.991301Z",
     "iopub.status.busy": "2026-06-20T05:30:53.990605Z",
     "iopub.status.idle": "2026-06-20T05:33:03.957208Z",
     "shell.execute_reply": "2026-06-20T05:33:03.956077Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:52:54.882752300Z",
     "start_time": "2026-06-20T18:51:10.517442300Z"
    }
   },
   "source": [
    "callbacks = [\n",
    "    DeltaThreshold(threshold=0.0005, generations=5, metric=\"fitness_best\"),\n",
    "    ConsecutiveStopping(generations=7, metric=\"fitness_best\"),\n",
    "    TimerStopping(total_seconds=120),\n",
    "]\n",
    "\n",
    "with mlflow.start_run(run_name=\"ga-random-forest-search\") as parent_run:\n",
    "    mlflow.set_tags(\n",
    "        {\n",
    "            \"project\": \"sklearn-genetic-opt\",\n",
    "            \"mlflow_version\": mlflow.__version__,\n",
    "            \"run_level\": \"parent\",\n",
    "            \"optimizer\": \"GASearchCV\",\n",
    "        }\n",
    "    )\n",
    "    mlflow.log_input(train_dataset, context=\"training\")\n",
    "    mlflow.log_input(test_dataset, context=\"holdout\")\n",
    "    mlflow.log_params(\n",
    "        {\n",
    "            \"population_size\": search.population_size,\n",
    "            \"generations\": search.generations,\n",
    "            \"population_initializer\": search.population_initializer,\n",
    "            \"parallel_backend\": search.parallel_backend,\n",
    "            \"local_search\": search.local_search,\n",
    "            \"diversity_control\": search.diversity_control,\n",
    "            \"fitness_sharing\": search.fitness_sharing,\n",
    "        }\n",
    "    )\n",
    "\n",
    "    logged_model = mlflow.initialize_logged_model(\n",
    "        name=\"ga-random-forest-best-model\",\n",
    "        source_run_id=parent_run.info.run_id,\n",
    "        model_type=\"classifier\",\n",
    "        tags={\"stage\": \"candidate\", \"owner\": \"sklearn-genetic-opt\"},\n",
    "    )\n",
    "\n",
    "    search.fit(X_train, y_train, callbacks=callbacks)\n",
    "\n",
    "    probabilities = search.predict_proba(X_test)[:, 1]\n",
    "    predictions = search.predict(X_test)\n",
    "    holdout_metrics = {\n",
    "        \"holdout_accuracy\": accuracy_score(y_test, predictions),\n",
    "        \"holdout_balanced_accuracy\": balanced_accuracy_score(y_test, predictions),\n",
    "        \"holdout_roc_auc\": roc_auc_score(y_test, probabilities),\n",
    "    }\n",
    "\n",
    "    mlflow.log_metrics(holdout_metrics)\n",
    "    mlflow.log_metric(\"best_cv_roc_auc\", search.best_score_)\n",
    "    mlflow.log_params({f\"best__{key}\": value for key, value in search.best_params_.items()})\n",
    "    mlflow.log_metrics(\n",
    "        {\n",
    "            f\"fit_stats_{key}\": value\n",
    "            for key, value in search.fit_stats_.items()\n",
    "            if isinstance(value, (int, float))\n",
    "        }\n",
    "    )\n",
    "\n",
    "    mlflow.sklearn.log_model(\n",
    "        sk_model=search.best_estimator_,\n",
    "        name=\"best_estimator\",\n",
    "        model_id=logged_model.model_id,\n",
    "        input_example=X_test.head(5),\n",
    "        params=search.best_params_,\n",
    "        tags={\"optimizer\": \"GASearchCV\", \"dataset\": \"breast_cancer\"},\n",
    "        model_type=\"classifier\",\n",
    "    )\n",
    "    mlflow.set_logged_model_tags(\n",
    "        logged_model.model_id,\n",
    "        {\n",
    "            \"stage\": \"validated\",\n",
    "            \"best_cv_roc_auc\": f\"{search.best_score_:.4f}\",\n",
    "            \"holdout_roc_auc\": f\"{holdout_metrics['holdout_roc_auc']:.4f}\",\n",
    "        },\n",
    "    )\n",
    "    mlflow.finalize_logged_model(logged_model.model_id, status=\"READY\")\n",
    "\n",
    "parent_run_id = parent_run.info.run_id\n",
    "logged_model_id = logged_model.model_id\n",
    "holdout_metrics"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " gen evals           avg          best     div  unique  stag     mut   sel             events\n",
      "---- ----- ------------- ------------- ------- ------- ----- ------- ----- ------------------\n",
      "   0    12       0.98609       0.99130   0.742   1.000     0       -     - -                 \n",
      "   1    24       0.98509       0.99130   0.394   0.750     1   0.200     3 share             \n",
      "   2    24       0.98507       0.99130   0.394   0.667     2   0.216     3 dup=9,share       \n",
      "   3    24       0.98486       0.99130   0.242   0.583     3   0.193     3 dup=7,share       \n",
      "   4    24       0.98519       0.99130   0.364   0.750     4   0.319     3 div,imm=3,dup=7,sh\n",
      "INFO: DeltaThreshold callback met its criteria\n",
      "INFO: Stopping the algorithm\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]"
      ],
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "8bdb0af92775499c90d066ad854865e7"
      }
     },
     "metadata": {},
     "output_type": "display_data",
     "jetTransient": {
      "display_id": null
     }
    },
    {
     "data": {
      "text/plain": [
       "{'holdout_accuracy': 0.9298245614035088,\n",
       " 'holdout_balanced_accuracy': 0.9250876168224299,\n",
       " 'holdout_roc_auc': 0.9875876168224299}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 5
  },
  {
   "cell_type": "markdown",
   "id": "8fcb6e6f",
   "metadata": {},
   "source": [
    "## Inspect the Best Model and Metrics\n",
    "\n",
    "The fitted search object still behaves like a sklearn estimator. The MLflow run now contains the same summary information for experiment tracking and later comparison."
   ]
  },
  {
   "cell_type": "code",
   "id": "fc658a0f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:33:03.960295Z",
     "iopub.status.busy": "2026-06-20T05:33:03.959786Z",
     "iopub.status.idle": "2026-06-20T05:33:03.965680Z",
     "shell.execute_reply": "2026-06-20T05:33:03.964645Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:52:54.897702600Z",
     "start_time": "2026-06-20T18:52:54.884752100Z"
    }
   },
   "source": [
    "print(\"Parent run ID:\", parent_run_id)\n",
    "print(\"Logged model ID:\", logged_model_id)\n",
    "print(\"Best CV ROC AUC:\", round(search.best_score_, 4))\n",
    "print(\"Best parameters:\")\n",
    "pprint(search.best_params_)"
   ],
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parent run ID: edff6735fa1b4eab8b61205969cbf748\n",
      "Logged model ID: m-874be46c114a4293aed5527a9aabe7fd\n",
      "Best CV ROC AUC: 0.9915\n",
      "Best parameters:\n",
      "{'ccp_alpha': 0.0041418922671775625,\n",
      " 'max_depth': 4,\n",
      " 'max_features': 'log2',\n",
      " 'min_samples_leaf': 5,\n",
      " 'min_samples_split': 7,\n",
      " 'n_estimators': 97}\n"
     ]
    }
   ],
   "execution_count": 6
  },
  {
   "cell_type": "code",
   "id": "570272d4",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:33:03.967996Z",
     "iopub.status.busy": "2026-06-20T05:33:03.967611Z",
     "iopub.status.idle": "2026-06-20T05:33:03.979294Z",
     "shell.execute_reply": "2026-06-20T05:33:03.978457Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:52:54.936090200Z",
     "start_time": "2026-06-20T18:52:54.899703Z"
    }
   },
   "source": [
    "pd.DataFrame([holdout_metrics], index=[\"ga_random_forest\"])"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "                  holdout_accuracy  holdout_balanced_accuracy  holdout_roc_auc\n",
       "ga_random_forest          0.929825                   0.925088         0.987588"
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>holdout_accuracy</th>\n",
       "      <th>holdout_balanced_accuracy</th>\n",
       "      <th>holdout_roc_auc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ga_random_forest</th>\n",
       "      <td>0.929825</td>\n",
       "      <td>0.925088</td>\n",
       "      <td>0.987588</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 7
  },
  {
   "cell_type": "code",
   "id": "d9b9abc0",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:33:03.982493Z",
     "iopub.status.busy": "2026-06-20T05:33:03.982108Z",
     "iopub.status.idle": "2026-06-20T05:33:03.988182Z",
     "shell.execute_reply": "2026-06-20T05:33:03.986975Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:52:54.945598900Z",
     "start_time": "2026-06-20T18:52:54.937086500Z"
    }
   },
   "source": [
    "search.fit_stats_"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'evaluated_candidates': 110,\n",
       " 'unique_candidates': 109,\n",
       " 'cross_validate_calls': 109,\n",
       " 'cache_hits': 1,\n",
       " 'duplicate_candidates': 0,\n",
       " 'skipped_invalid_candidates': 0,\n",
       " 'population_parallel_batches': 0,\n",
       " 'population_serial_batches': 6,\n",
       " 'random_immigrants': 3,\n",
       " 'local_refinement_candidates': 2}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 8
  },
  {
   "cell_type": "markdown",
   "id": "6ea58f06",
   "metadata": {},
   "source": [
    "## Search Runs and Logged Models\n",
    "\n",
    "MLflow can query both runs and logged models. The parent run contains the summary. The nested candidate runs contain individual hyperparameter evaluations emitted by `MLflowConfig`."
   ]
  },
  {
   "cell_type": "code",
   "id": "72da5d0d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:33:03.991028Z",
     "iopub.status.busy": "2026-06-20T05:33:03.990708Z",
     "iopub.status.idle": "2026-06-20T05:33:04.092487Z",
     "shell.execute_reply": "2026-06-20T05:33:04.091419Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:52:55.094568500Z",
     "start_time": "2026-06-20T18:52:54.946596700Z"
    }
   },
   "source": [
    "experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)\n",
    "\n",
    "runs = mlflow.search_runs(\n",
    "    experiment_ids=[experiment.experiment_id],\n",
    "    order_by=[\"attributes.start_time DESC\"],\n",
    ")\n",
    "\n",
    "columns = [\n",
    "    \"run_id\",\n",
    "    \"tags.mlflow.runName\",\n",
    "    \"tags.run_level\",\n",
    "    \"metrics.score\",\n",
    "    \"metrics.best_cv_roc_auc\",\n",
    "    \"metrics.holdout_roc_auc\",\n",
    "]\n",
    "runs[[column for column in columns if column in runs.columns]].head(10)"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "                             run_id      tags.mlflow.runName tags.run_level  \\\n",
       "0  226f1435fb4d493d9ffdc275e8df8280  candidate-random-forest           None   \n",
       "1  7b02b3ced1364b18bff67c23239f67ab  candidate-random-forest           None   \n",
       "2  13f275bfaf8042cd8c9c1d4408846c65  candidate-random-forest           None   \n",
       "3  e63da585303f467cb86476eec8702bda  candidate-random-forest           None   \n",
       "4  475925e5ddfd4b9a9708d33c62e080b4  candidate-random-forest           None   \n",
       "5  b6ac482b478d40919d0ccc4ec6c88dce  candidate-random-forest           None   \n",
       "6  24a899b7318c4723a678ab87b9356bef  candidate-random-forest           None   \n",
       "7  acd8ba068251403aaf3ad0a5155d9ec1  candidate-random-forest           None   \n",
       "8  e86a7aaed7324dc1ad1da67b4cdb0d1b  candidate-random-forest           None   \n",
       "9  930af0563d3e4b0cbe5fd9a78789fa37  candidate-random-forest           None   \n",
       "\n",
       "   metrics.score  metrics.best_cv_roc_auc  metrics.holdout_roc_auc  \n",
       "0       0.984827                      NaN                      NaN  \n",
       "1       0.991458                      NaN                      NaN  \n",
       "2       0.984292                      NaN                      NaN  \n",
       "3       0.984099                      NaN                      NaN  \n",
       "4       0.984838                      NaN                      NaN  \n",
       "5       0.984645                      NaN                      NaN  \n",
       "6       0.986772                      NaN                      NaN  \n",
       "7       0.984651                      NaN                      NaN  \n",
       "8       0.985848                      NaN                      NaN  \n",
       "9       0.987692                      NaN                      NaN  "
      ],
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>run_id</th>\n",
       "      <th>tags.mlflow.runName</th>\n",
       "      <th>tags.run_level</th>\n",
       "      <th>metrics.score</th>\n",
       "      <th>metrics.best_cv_roc_auc</th>\n",
       "      <th>metrics.holdout_roc_auc</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>226f1435fb4d493d9ffdc275e8df8280</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.984827</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7b02b3ced1364b18bff67c23239f67ab</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.991458</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>13f275bfaf8042cd8c9c1d4408846c65</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.984292</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>e63da585303f467cb86476eec8702bda</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.984099</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>475925e5ddfd4b9a9708d33c62e080b4</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.984838</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>b6ac482b478d40919d0ccc4ec6c88dce</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.984645</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>24a899b7318c4723a678ab87b9356bef</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.986772</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>acd8ba068251403aaf3ad0a5155d9ec1</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.984651</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>e86a7aaed7324dc1ad1da67b4cdb0d1b</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.985848</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>930af0563d3e4b0cbe5fd9a78789fa37</td>\n",
       "      <td>candidate-random-forest</td>\n",
       "      <td>None</td>\n",
       "      <td>0.987692</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 9
  },
  {
   "cell_type": "code",
   "id": "adfaffe8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-06-20T05:33:04.095790Z",
     "iopub.status.busy": "2026-06-20T05:33:04.095389Z",
     "iopub.status.idle": "2026-06-20T05:33:04.110798Z",
     "shell.execute_reply": "2026-06-20T05:33:04.109091Z"
    },
    "ExecuteTime": {
     "end_time": "2026-06-20T18:52:55.117313300Z",
     "start_time": "2026-06-20T18:52:55.095553600Z"
    }
   },
   "source": [
    "logged_models = mlflow.search_logged_models(\n",
    "    experiment_ids=[experiment.experiment_id],\n",
    "    order_by=[{\"field_name\": \"creation_time\", \"ascending\": False}],\n",
    "    output_format=\"list\",\n",
    ")\n",
    "\n",
    "[(model.model_id, model.name, model.status) for model in logged_models[:5]]"
   ],
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('m-874be46c114a4293aed5527a9aabe7fd',\n",
       "  'ga-random-forest-best-model',\n",
       "  <LoggedModelStatus.READY: 'READY'>),\n",
       " ('m-81188119d3614150a4e11cbc425d3ec7',\n",
       "  'ga-random-forest-best-model',\n",
       "  <LoggedModelStatus.PENDING: 'PENDING'>)]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "execution_count": 10
  },
  {
   "cell_type": "markdown",
   "id": "1ccd825e",
   "metadata": {},
   "source": [
    "## Open the MLflow UI\n",
    "\n",
    "From the repository root, run the command below in a terminal and open the printed local URL. Because this notebook uses a local SQLite tracking backend, point the UI at the same database."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6103a4e",
   "metadata": {},
   "source": [
    "```bash\n",
    "mlflow ui --backend-store-uri sqlite:///mlflow3_tracking.db\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68ba7b49",
   "metadata": {},
   "source": [
    "## Practical Notes\n",
    "\n",
    "- Use a parent run for the overall search and nested runs for candidate-level details.\n",
    "- Log datasets with `mlflow.log_input` so future readers know which data context produced the model.\n",
    "- Keep `save_models=False` in `MLflowConfig` if candidate-level model artifacts are too heavy; log only the final `best_estimator_` from the parent run.\n",
    "- Use logged-model tags for lifecycle metadata such as `stage`, validation metrics, owner, and optimizer settings.\n",
    "- For remote tracking, replace `TRACKING_URI` with your MLflow tracking server URI and keep the rest of the notebook unchanged."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.14"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {
     "02e86cb188d94614bea274019cc9bbd2": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "221e0f05b61e4eb1876a0310d308a344": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     },
     "57a726973944497baf35f7d3c4c7ed33": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "66346c3682994f3d962af1a507240072": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "FloatProgressModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "FloatProgressModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "ProgressView",
       "bar_style": "success",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_b46084eeea934332aac903be3242b960",
       "max": 7.0,
       "min": 0.0,
       "orientation": "horizontal",
       "style": "IPY_MODEL_8cdd005dd3cd4f70aa8054bf63eb158a",
       "tabbable": null,
       "tooltip": null,
       "value": 7.0
      }
     },
     "6a37f62598ce4906b867eacfbfc01fcc": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_02e86cb188d94614bea274019cc9bbd2",
       "placeholder": "​",
       "style": "IPY_MODEL_221e0f05b61e4eb1876a0310d308a344",
       "tabbable": null,
       "tooltip": null,
       "value": "Downloading artifacts: 100%"
      }
     },
     "6da268aa30d94809a4d7d790fead6f36": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HBoxModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HBoxModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HBoxView",
       "box_style": "",
       "children": [
        "IPY_MODEL_6a37f62598ce4906b867eacfbfc01fcc",
        "IPY_MODEL_66346c3682994f3d962af1a507240072",
        "IPY_MODEL_749b2fd31f2343d58d949984beae4603"
       ],
       "layout": "IPY_MODEL_57a726973944497baf35f7d3c4c7ed33",
       "tabbable": null,
       "tooltip": null
      }
     },
     "749b2fd31f2343d58d949984beae4603": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLModel",
      "state": {
       "_dom_classes": [],
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/controls",
       "_view_module_version": "2.0.0",
       "_view_name": "HTMLView",
       "description": "",
       "description_allow_html": false,
       "layout": "IPY_MODEL_d7eec3d95b894385ba96404e5abac2c2",
       "placeholder": "​",
       "style": "IPY_MODEL_fab7b4c582e44f1ebd0b0a5ef1b71d1a",
       "tabbable": null,
       "tooltip": null,
       "value": " 7/7 [00:00&lt;00:00, 494.92it/s]"
      }
     },
     "8cdd005dd3cd4f70aa8054bf63eb158a": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "ProgressStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "ProgressStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "bar_color": null,
       "description_width": ""
      }
     },
     "b46084eeea934332aac903be3242b960": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "d7eec3d95b894385ba96404e5abac2c2": {
      "model_module": "@jupyter-widgets/base",
      "model_module_version": "2.0.0",
      "model_name": "LayoutModel",
      "state": {
       "_model_module": "@jupyter-widgets/base",
       "_model_module_version": "2.0.0",
       "_model_name": "LayoutModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "LayoutView",
       "align_content": null,
       "align_items": null,
       "align_self": null,
       "border_bottom": null,
       "border_left": null,
       "border_right": null,
       "border_top": null,
       "bottom": null,
       "display": null,
       "flex": null,
       "flex_flow": null,
       "grid_area": null,
       "grid_auto_columns": null,
       "grid_auto_flow": null,
       "grid_auto_rows": null,
       "grid_column": null,
       "grid_gap": null,
       "grid_row": null,
       "grid_template_areas": null,
       "grid_template_columns": null,
       "grid_template_rows": null,
       "height": null,
       "justify_content": null,
       "justify_items": null,
       "left": null,
       "margin": null,
       "max_height": null,
       "max_width": null,
       "min_height": null,
       "min_width": null,
       "object_fit": null,
       "object_position": null,
       "order": null,
       "overflow": null,
       "padding": null,
       "right": null,
       "top": null,
       "visibility": null,
       "width": null
      }
     },
     "fab7b4c582e44f1ebd0b0a5ef1b71d1a": {
      "model_module": "@jupyter-widgets/controls",
      "model_module_version": "2.0.0",
      "model_name": "HTMLStyleModel",
      "state": {
       "_model_module": "@jupyter-widgets/controls",
       "_model_module_version": "2.0.0",
       "_model_name": "HTMLStyleModel",
       "_view_count": null,
       "_view_module": "@jupyter-widgets/base",
       "_view_module_version": "2.0.0",
       "_view_name": "StyleView",
       "background": null,
       "description_width": "",
       "font_size": null,
       "text_color": null
      }
     }
    },
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}