Integrating with MLflow ======================= In this post, we are going to explain how setup the build-in integration of sklearn-genetic-opt with MLflow. To use this feature, we must set the parameters that will include the tracking server, experiment name, run name, tags and others, the full implementation is here: :class:`~sklearn_genetic.mlflow_log.MLflowConfig` Configuration ------------- The configuration is pretty straightforward, we just need to import the main class and define some parameters, here there is its meaning: * **tracking_uri:** Address of local or remote-tracking server. * **experiment:** Case sensitive name of an experiment to be activated. * **run_name:** Name of new run (stored as a mlflow.runName tag). * **save_models:** If ``True``, it will log the estimator into mlflow artifacts. * **registry_uri:** Address of local or remote model registry server. * **tags:** Dictionary of tags to apply. Example -------- In this example, we are going to log the information into a mlflow server that is running in our localhost, port 5000, we want to save each of the trained models. .. code:: python3 from sklearn_genetic.mlflow_log import MLflowConfig mlflow_config = MLflowConfig( tracking_uri="http://localhost:5000", experiment="Digits-sklearn-genetic-opt", run_name="Decision Tree", save_models=True, tags={"team": "sklearn-genetic-opt", "version": "0.5.0"}) Now, this config is passed to the :class:`~sklearn_genetic.GASearchCV` class in the parameter named `log_config`, for example: .. code:: python3 from sklearn_genetic import GASearchCV from sklearn_genetic.space import Categorical, Integer, Continuous from sklearn.model_selection import train_test_split, StratifiedKFold from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_digits from sklearn.metrics import accuracy_score from sklearn_genetic.mlflow import MLflowConfig data = load_digits() label_names = data["target_names"] y = data["target"] X = data["data"] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33, random_state=42) clf = DecisionTreeClassifier() params_grid = { "min_weight_fraction_leaf": Continuous(0, 0.5), "criterion": Categorical(["gini", "entropy"]), "max_depth": Integer(2, 20), "max_leaf_nodes": Integer(2, 30)} cv = StratifiedKFold(n_splits=3, shuffle=True) evolved_estimator = GASearchCV( clf, cv=cv, scoring="accuracy", population_size=3, generations=5, tournament_size=3, elitism=True, crossover_probability=0.9, mutation_probability=0.05, param_grid=params_grid, algorithm="eaMuPlusLambda", n_jobs=-1, verbose=True, log_config=mlflow_config) evolved_estimator.fit(X_train, y_train) y_predict_ga = evolved_estimator.predict(X_test) accuracy = accuracy_score(y_test, y_predict_ga) print(evolved_estimator.best_params_) Notice that we choose small generations and population_size, just to be able to see the results without much verbosity. If you go to your mlflow UI and click the experiment named "Digits-sklearn-genetic-opt" we should see something like this (I've hidden some columns to give a better look): .. image:: ../images/mlflow_experiment_0.png There we can see the user that ran the experiment, the name of the file which contained the source code, our tags and other metadata. Notice that there is a "plus" symbol that will show us each of our iterations, this is because sklearn-genetic-opt will log each `GASearchCV.fit()` call in a nested way, think it like a parent run, and each child is one of the hyperparameters that were tested, for example, if we run the same code again, now we see two parents run: .. image:: ../images/mlflow_nested_run_1.png Now click on any of the "plus" symbols to see all the children, now they look like this (again edited the columns to display): .. image:: ../images/mlflow_children_2.png From there we can see the hyperparameters and the score (cross-validation) that we got in each run, from there we can use the regular mlflow functionalities like comparing runs, download the CSV, register a model, etc. You can see more on https://mlflow.org/docs/latest/index.html Now, as we set ``save_model=True``, you can see that the column "Model" has a file attached as an artifact, if we click on one of those, we see a resume of that particular execution and some utils to use right away the model: .. image:: ../images/mlflow_artifacts_4.png