Advanced configuration

The documentation on configuration describes how to satisfy most common requirements of standard Kedro project configuration:

By default, Kedro is set up to use the ConfigLoader class. Kedro also provides two additional configuration loaders with more advanced functionality: the TemplatedConfigLoader and the OmegaConfigLoader. Each of these classes are alternatives for the default ConfigLoader and have different features. The following sections describe each of these classes and their specific functionality in more detail.

TemplatedConfigLoader

Kedro provides an extension TemplatedConfigLoader class that allows you to template values in configuration files. To apply templating in your project, set the CONFIG_LOADER_CLASS constant in your src/<package_name>/settings.py:

from kedro.config import TemplatedConfigLoader  # new import

CONFIG_LOADER_CLASS = TemplatedConfigLoader

Provide template values through globals

When using the TemplatedConfigLoader you can provide values in the configuration template through a globals file or dictionary.

Let’s assume the project contains a conf/base/globals.yml file with the following contents:

bucket_name: "my_s3_bucket"
key_prefix: "my/key/prefix/"

datasets:
    csv: "pandas.CSVDataSet"
    spark: "spark.SparkDataSet"

folders:
    raw: "01_raw"
    int: "02_intermediate"
    pri: "03_primary"
    fea: "04_feature"

To point your TemplatedConfigLoader to the globals file, add it to the the CONFIG_LOADER_ARGS variable in src/<package_name>/settings.py:

CONFIG_LOADER_ARGS = {"globals_pattern": "*globals.yml"}

Now the templating can be applied to the configuration. Here is an example of a templated conf/base/catalog.yml file:

raw_boat_data:
    type: "${datasets.spark}"  # nested paths into global dict are allowed
    filepath: "s3a://${bucket_name}/${key_prefix}/${folders.raw}/boats.csv"
    file_format: parquet

raw_car_data:
    type: "${datasets.csv}"
    filepath: "s3://${bucket_name}/data/${key_prefix}/${folders.raw}/${filename|cars.csv}"  # default to 'cars.csv' if the 'filename' key is not found in the global dict

Under the hood, TemplatedConfigLoader uses JMESPath syntax to extract elements from the globals dictionary.

Alternatively, you can declare which values to fill in the template through a dictionary. This dictionary could look like the following:

{
    "bucket_name": "another_bucket_name",
    "non_string_key": 10,
    "key_prefix": "my/key/prefix",
    "datasets": {"csv": "pandas.CSVDataSet", "spark": "spark.SparkDataSet"},
    "folders": {
        "raw": "01_raw",
        "int": "02_intermediate",
        "pri": "03_primary",
        "fea": "04_feature",
    },
}

To point your TemplatedConfigLoader to the globals dictionary, add it to the CONFIG_LOADER_ARGS variable in src/<package_name>/settings.py:

CONFIG_LOADER_ARGS = {
    "globals_dict": {
        "bucket_name": "another_bucket_name",
        "non_string_key": 10,
        "key_prefix": "my/key/prefix",
        "datasets": {"csv": "pandas.CSVDataSet", "spark": "spark.SparkDataSet"},
        "folders": {
            "raw": "01_raw",
            "int": "02_intermediate",
            "pri": "03_primary",
            "fea": "04_feature",
        },
    }
}

If you specify both globals_pattern and globals_dict in CONFIG_LOADER_ARGS, the contents of the dictionary resulting from globals_pattern are merged with the globals_dict dictionary. In case of conflicts, the keys from the globals_dict dictionary take precedence.

OmegaConfigLoader

OmegaConf is a Python library designed for configuration. It is a YAML-based hierarchical configuration system with support for merging configurations from multiple sources.

From Kedro 0.18.5 you can use the OmegaConfigLoader which uses OmegaConf under the hood to load data.

Note

OmegaConfigLoader is under active development. It was first available from Kedro 0.18.5 with additional features due in later releases. Let us know if you have any feedback about the OmegaConfigLoader.

OmegaConfigLoader can load YAML and JSON files. Acceptable file extensions are .yml, .yaml, and .json. By default, any configuration files used by the config loaders in Kedro are .yml files.

To use OmegaConfigLoader in your project, set the CONFIG_LOADER_CLASS constant in your src/<package_name>/settings.py:

from kedro.config import OmegaConfigLoader  # new import

CONFIG_LOADER_CLASS = OmegaConfigLoader

Advanced Kedro configuration

This section contains a set of guidance for advanced configuration requirements of standard Kedro projects:

How to change which configuration files are loaded

If you want to change the patterns that the configuration loader uses to find the files to load you need to set the CONFIG_LOADER_ARGS variable in src/<package_name>/settings.py. For example, if your parameters files are using a params naming convention instead of parameters (e.g. params.yml) you need to update CONFIG_LOADER_ARGS as follows:

CONFIG_LOADER_ARGS = {
    "config_patterns": {
        "parameters": ["params*", "params*/**", "**/params*"],
    }
}

By changing this setting, the default behaviour for loading parameters will be replaced, while the other configuration patterns will remain in their default state.

How to ensure non default configuration files get loaded

You can add configuration patterns to match files other than parameters, credentials, logging, and catalog by setting the CONFIG_LOADER_ARGS variable in src/<package_name>/settings.py. For example, if you want to load Spark configuration files you need to update CONFIG_LOADER_ARGS as follows:

CONFIG_LOADER_ARGS = {
    "config_patterns": {
        "spark": ["spark*/"],
    }
}

How to bypass the configuration loading rules

You can bypass the configuration patterns and set configuration directly on the instance of a config loader class. You can bypass the default configuration (catalog, parameters, credentials, and logging) as well as additional configuration.

10from kedro.config import ConfigLoader
11from kedro.framework.project import settings
12
13conf_path = str(project_path / settings.CONF_SOURCE)
14conf_loader = ConfigLoader(conf_source=conf_path)
15
16# Bypass configuration patterns by setting the key and values directly on the config loader instance.
17conf_loader["catalog"] = {"catalog_config": "something_new"}

How to use Jinja2 syntax in configuration

From version 0.17.0, TemplatedConfigLoader also supports the Jinja2 template engine alongside the original template syntax. Below is an example of a catalog.yml file that uses both features:

{% for speed in ['fast', 'slow'] %}
{{ speed }}-trains:
    type: MemoryDataSet

{{ speed }}-cars:
    type: pandas.CSVDataSet
    filepath: s3://${bucket_name}/{{ speed }}-cars.csv
    save_args:
        index: true

{% endfor %}

When parsing this configuration file, TemplatedConfigLoader will:

  1. Read the catalog.yml and compile it using Jinja2

  2. Use a YAML parser to parse the compiled config into a Python dictionary

  3. Expand ${bucket_name} in filepath using the globals_pattern and globals_dict arguments for the TemplatedConfigLoader instance, as in the previous examples

The output Python dictionary will look as follows:

{
    "fast-trains": {"type": "MemoryDataSet"},
    "fast-cars": {
        "type": "pandas.CSVDataSet",
        "filepath": "s3://my_s3_bucket/fast-cars.csv",
        "save_args": {"index": True},
    },
    "slow-trains": {"type": "MemoryDataSet"},
    "slow-cars": {
        "type": "pandas.CSVDataSet",
        "filepath": "s3://my_s3_bucket/slow-cars.csv",
        "save_args": {"index": True},
    },
}

Warning

Although Jinja2 is a very powerful and extremely flexible template engine, which comes with a wide range of features, we do not recommend using it to template your configuration unless absolutely necessary. The flexibility of dynamic configuration comes at a cost of significantly reduced readability and much higher maintenance overhead. We believe that, for the majority of analytics projects, dynamically compiled configuration does more harm than good.

How to do templating with the OmegaConfigLoader

Parameters

Templating or variable interpolation, as it’s called in OmegaConf, for parameters works out of the box if the template values are within the parameter files or the name of the file that contains the template values follows the same config pattern specified for parameters. By default, the config pattern for parameters is: ["parameters*", "parameters*/**", "**/parameters*"]. Suppose you have one parameters file called parameters.yml containing parameters with omegaconf placeholders like this:

model_options:
  test_size: ${data.size}
  random_state: 3

and a file containing the template values called parameters_globals.yml:

data:
  size: 0.2

Since both of the file names (parameters.yml and parameters_globals.yml) match the config pattern for parameters, the OmegaConfigLoader will load the files and resolve the placeholders correctly.

Catalog

From Kedro 0.18.10 templating also works for catalog files. To enable templating in the catalog you need to ensure that the template values are within the catalog files or the name of the file that contains the template values follows the same config pattern specified for catalogs. By default, the config pattern for catalogs is: ["catalog*", "catalog*/**", "**/catalog*"].

Additionally, any template values in the catalog need to start with an underscore _. This is because of how catalog entries are validated. Templated values will neither trigger a key duplication error nor appear in the resulting configuration dictionary.

Suppose you have one catalog file called catalog.yml containing entries with omegaconf placeholders like this:

companies:
  type: ${_pandas.type}
  filepath: data/01_raw/companies.csv

and a file containing the template values called catalog_globals.yml:

_pandas:
  type: pandas.CSVDataSet

Since both of the file names (catalog.yml and catalog_globals.yml) match the config pattern for catalogs, the OmegaConfigLoader will load the files and resolve the placeholders correctly.

Other configuration files

It’s also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it’s not mandatory like it is for catalog files.

How to use custom resolvers in the OmegaConfigLoader

Omegaconf provides functionality to register custom resolvers for templated values. You can use these custom resolves within Kedro by extending the OmegaConfigLoader class. The example below illustrates this:

from kedro.config import OmegaConfigLoader
from omegaconf import OmegaConf
from typing import Any, Dict


class CustomOmegaConfigLoader(OmegaConfigLoader):
    def __init__(
        self,
        conf_source: str,
        env: str = None,
        runtime_params: Dict[str, Any] = None,
    ):
        super().__init__(
            conf_source=conf_source, env=env, runtime_params=runtime_params
        )

        # Register a customer resolver that adds up numbers.
        self.register_custom_resolver("add", lambda *numbers: sum(numbers))

    @staticmethod
    def register_custom_resolver(name, function):
        """
        Helper method that checks if the resolver has already been registered and registers the
        resolver if it's new. The check is needed, because omegaconf will throw an error
        if a resolver with the same name is registered twice.
        Alternatively, you can call `register_new_resolver()` with  `replace=True`.
        """
        if not OmegaConf.has_resolver(name):
            OmegaConf.register_new_resolver(name, function)

In order to use this custom configuration loader, you will need to set it as the project configuration loader in src/<package_name>/settings.py:

from package_name.custom_configloader import CustomOmegaConfigLoader

CONFIG_LOADER_CLASS = CustomOmegaConfigLoader

You can then use the custom “add” resolver in your parameters.yml as follows:

model_options:
  test_size: ${add:1,2,3}
  random_state: 3

How to load credentials through environment variables

The OmegaConfigLoader enables you to load credentials from environment variables. To achieve this you have to use the OmegaConfigLoader and the omegaconf oc.env resolver. To use the OmegaConfigLoader in your project, set the CONFIG_LOADER_CLASS constant in your src/<package_name>/settings.py:

from kedro.config import OmegaConfigLoader  # new import

CONFIG_LOADER_CLASS = OmegaConfigLoader

Now you can use the oc.env resolver to access credentials from environment variables in your credentials.yml, as demonstrated in the following example:

dev_s3:
  client_kwargs:
    aws_access_key_id: ${oc.env:AWS_ACCESS_KEY_ID}
    aws_secret_access_key: ${oc.env:AWS_SECRET_ACCESS_KEY}

Note

Note that you can only use the resolver in credentials.yml and not in catalog or parameter files. This is because we do not encourage the usage of environment variables for anything other than credentials.