Experiment tracking

Warning

Experiment tracking in Kedro is launched as beta functionality. We encourage everyone to try it out and give us feedback so that we can settle on the final implementation of the feature.

Experiment tracking is a way to record all information that you would need to recreate and analyse a data science experiment. We think of it as logging for parameters, metrics, models and other dataset types. Kedro currently supports parts of this functionality. For example, it’s possible to log parameters as part of your codebase and snapshot models and other artefacts like plots with Kedro’s versioning capabilities for datasets. However, Kedro was missing a way to log metrics and capture all this logged data as a timestamped run of an experiment. It was also missing a way for users to visualise, discover and compare this logged data.

Experiment tracking in Kedro adds in the missing pieces and will be developed incrementally.

The following section outlines the setup within your Kedro project to enable experiment tracking. You can also refer to the Kedro Viz documentation about experiment tracking for a step-by-step process to access your tracking datasets on Kedro-Viz.

Enable experiment tracking

Set up the session store

In the domain of experiment tracking, each pipeline run is considered a session. A session store records all related metadata for each pipeline run, from logged metrics to other run-related data such as timestamp, git username and branch. The session store is a SQLite database that is generated during your first pipeline run after it has been set up in your project.

To set up the session store, go to the src/settings.py file and add the following:

from kedro_viz.integrations.kedro.sqlite_store import SQLiteStore
from pathlib import Path

SESSION_STORE_CLASS = SQLiteStore
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")}

This will specify the creation of the SQLiteStore under the /data subfolder, using the SQLiteStore setup from your installed Kedro-Viz plugin.

Please ensure that your installed version of Kedro-Viz is at least version 4.1.1 onwards. This step is crucial to enable experiment tracking features on Kedro-Viz, as it is the database used to serve all run data to the Kedro-Viz front-end.

Set up tracking datasets

Use either one of the tracking.MetricsDataSet or tracking.JSONDataSet in your data catalog. These datasets are versioned by default to ensure a historical record is kept of the logged data. The tracking.MetricsDataSet should be used for tracking numerical metrics and the tracking.JSONDataSet can be used for tracking any other JSON-compatible data. In Kedro-Viz these datasets will be visualised in the metadata side panel.

Below is an example of how to add experiment tracking to your pipeline. Add a tracking.MetricsDataSet and/or tracking.JSONDataSet to your catalog.yml:

metrics:
  type: tracking.MetricsDataSet
  filepath: data/09_tracking/metrics.json

Set up your nodes and pipelines to log metrics

Add a node that returns the data to be tracked. The report_accuracy node below returns metrics.

# nodes.py
from sklearn.metrics import accuracy_score


def report_accuracy():
    """Node for reporting the accuracy of the predictions."""
    test_y = [0, 2, 1, 3]
    predictions = [0, 1, 2, 3]

    accuracy = accuracy_score(test_y, predictions)
    # Return the accuracy of the model
    return {"accuracy": accuracy}

Add the node to your pipeline and ensure that the output name matches the name of the dataset added to your catalog.

# pipeline.py
from kedro.pipeline import Pipeline, node, pipeline
from .nodes import report_accuracy


def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                report_accuracy,
                [],
                "metrics",
                name="report",
            ),
        ]
    )

Community solutions

You can find more solutions for experiment tracking developed by the Kedro community on the plugins page.