Advanced configuration¶

The documentation on configuration describes how to cover most common requirements of standard Kedro project configuration:

By default, Kedro is set up to use the kedro.config.OmegaConfigLoader class.

Advanced configuration for Kedro projects¶

This page also contains a set of guidance for advanced configuration requirements of standard Kedro projects:

How to use a custom config loader
How to change which configuration files are loaded
How to ensure non default configuration files get loaded
How to bypass the configuration loading rules
How to load a data catalog with templating in code?
How to use global variables with the OmegaConfigLoader
How to override configuration with runtime parameters with the OmegaConfigLoader
How to use resolvers in the OmegaConfigLoader
How to load credentials through environment variables with OmegaConfigLoader
How to change the merge strategy used by OmegaConfigLoader

How to use a custom configuration loader¶

You can build a custom configuration loader by extending the kedro.config.AbstractConfigLoader class:

from kedro.config import AbstractConfigLoader


class CustomConfigLoader(AbstractConfigLoader):
    def __init__(
        self,
        conf_source: str,
        env: str = None,
        runtime_params: Dict[str, Any] = None,
    ):
        super().__init__(
            conf_source=conf_source, env=env, runtime_params=runtime_params
        )

        # Custom implementation

To use this custom configuration loader, set it as the project configuration loader in src/<package_name>/settings.py as follows:

from package_name.custom_configloader import CustomConfigLoader

CONFIG_LOADER_CLASS = CustomConfigLoader

Custom configuration loaders that do not subclass OmegaConfigLoader will not include OmegaConf-specific functionalities such as interpolation, globals, runtime parameters or custom resolvers. To access these features, your loader must subclass OmegaConfigLoader.

How to change which configuration files are loaded¶

If you want to change the patterns that the configuration loader uses to find the files to load you need to set the CONFIG_LOADER_ARGS variable in src/<package_name>/settings.py. For example, if your parameters files are using a params naming convention instead of parameters (for example, params.yml) you need to update CONFIG_LOADER_ARGS as follows:

CONFIG_LOADER_ARGS = {
    "config_patterns": {
        "parameters": ["params*", "params*/**", "**/params*"],
    }
}

By changing this setting, the default behaviour for loading parameters will be replaced, while the other configuration patterns will remain in their default state.

How to ensure non default configuration files get loaded¶

You can add configuration patterns to match files other than parameters, credentials, and catalog by setting the CONFIG_LOADER_ARGS variable in src/<package_name>/settings.py. For example, if you want to load Spark configuration files you need to update CONFIG_LOADER_ARGS as follows:

CONFIG_LOADER_ARGS = {
    "config_patterns": {
        "spark": ["spark*/"],
    }
}

How to bypass the configuration loading rules¶

You can bypass the configuration patterns and set configuration directly on the instance of a config loader class. You can bypass the default configuration (catalog, parameters, and credentials) as well as additional configuration.

For example, you can use hooks to load external credentials.

If you are using a config loader as a standalone component, you can override configuration as follows:

:lineno-start: 10
:emphasize-lines: 8

from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings

conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = OmegaConfigLoader(conf_source=conf_path)

# Bypass configuration patterns by setting the key and values directly on the config loader instance.
conf_loader["catalog"] = {"catalog_config": "something_new"}

How to do templating with the `OmegaConfigLoader`¶

Parameters¶

Templating, or variable interpolation as OmegaConf calls it, works out of the box for parameters. The template values must be within the parameter files, or the name of the file that contains them must follow the same config pattern specified for parameters. By default, the config pattern for parameters is: ["parameters*", "parameters*/**", "**/parameters*"]. Suppose you have one parameters file called parameters.yml containing parameters with omegaconf placeholders like this:

model_options:
  test_size: ${data.size}
  random_state: 3

and a file containing the template values called parameters_variables.yml. The file name can be anything as long as it matches the config pattern for parameters:

data:
  size: 0.2

Since both of the file names (parameters.yml and parameters_variables.yml) match the config pattern for parameters, the OmegaConfigLoader will load the files and resolve the placeholders as expected.

Catalog¶

From Kedro 0.18.10 templating also works for catalog files. To enable it, ensure that the template values are within the catalog files. Or, the name of the file that contains them must follow the same config pattern specified for catalogs. By default, the config pattern for catalogs is: ["catalog*", "catalog*/**", "**/catalog*"].

Any template values in the catalog need to start with an underscore _. This is because of how catalog entries are validated. Templated values will neither trigger a key duplication error nor appear in the resulting configuration dictionary.

Suppose you have one catalog file called catalog.yml containing entries with omegaconf placeholders like this:

companies:
  type: ${_pandas.type}
  filepath: data/01_raw/companies.csv

and a file containing the template values called catalog_variables.yml:

_pandas:
  type: pandas.CSVDataset

Since both of the file names (catalog.yml and catalog_variables.yml) match the config pattern for catalogs, the OmegaConfigLoader will load the files and resolve the placeholders as expected.

Other configuration files¶

It's also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom Spark or MLflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it's not mandatory like it is for catalog files.

How to load a data catalog with templating in code?¶

You can use the OmegaConfigLoader to directly load a data catalog that contains templating in code. Internally the OmegaConfigLoader resolves any templates, so no further steps are required to load catalog entries properly.

# Example catalog with templating
companies:
  type: ${_dataset_type}
  filepath: data/01_raw/companies.csv

_dataset_type: pandas.CSVDataset

from kedro.config import OmegaConfigLoader
from kedro.framework.project import settings

# Instantiate an `OmegaConfigLoader` instance with the location of your project configuration.
conf_path = str(project_path / settings.CONF_SOURCE)
conf_loader = OmegaConfigLoader(conf_source=conf_path)

conf_catalog = conf_loader["catalog"]
# conf_catalog["companies"]
# Will result in: {'type': 'pandas.CSVDataset', 'filepath': 'data/01_raw/companies.csv'}

How to use global variables with the `OmegaConfigLoader`¶

From Kedro 0.18.13, you can use variable interpolation in your configurations using "globals" with OmegaConfigLoader. The benefit of using globals over regular variable interpolation is that the global variables are shared across different configuration types, such as catalog and parameters. By default, these global variables are assumed to be in files called globals.yml in any of your environments. If you want to configure the naming patterns for the files that contain your global variables, you can do so by overwriting the globals key in config_patterns. You can also bypass the configuration loading to directly set the global variables in OmegaConfigLoader.

Suppose you have global variables located in the file conf/base/globals.yml:

my_global_value: 45
dataset_type:
  csv: pandas.CSVDataset

You can access these global variables in your catalog or parameters config files with a globals resolver like this: conf/base/parameters.yml:

my_param : "${globals:my_global_value}"

conf/base/catalog.yml:

companies:
  filepath: data/01_raw/companies.csv
  type: "${globals:dataset_type.csv}"

You can also provide a default value to be used in case the global variable does not exist:

my_param: "${globals: nonexistent_global, 23}"

If there are duplicate keys in the globals files in your base and runtime environments, the values in the runtime environment overwrite the values in your base environment.

How to override configuration with runtime parameters with the `OmegaConfigLoader`¶

Kedro allows you to specify runtime parameters for the kedro run command with the --params CLI option. These runtime parameters are added to the KedroContext and merged with parameters from the configuration files to be used in your project's pipelines and nodes. From Kedro 0.18.14, you can use the runtime_params resolver to show that you want to override values of certain keys in your configuration with runtime parameters provided through the CLI option. This resolver can be used across different configuration types, such as parameters, catalog, and more, except for "globals".

Consider this parameters.yml file:

model_options:
  random_state: "${runtime_params:random}"

This will allow you to pass a runtime parameter named random through the CLI to specify the value of model_options.random_state in your project's parameters:

kedro run --params random=3

You can also specify a default value to be used in case the runtime parameter is not specified with the kedro run command. Consider this catalog entry:

companies:
  type: pandas.CSVDataset
  filepath: "${runtime_params:folder, 'data/01_raw'}/companies.csv"

If the folder parameter is not passed through the CLI --params option with kedro run, the default value 'data/01_raw/' is used for the filepath.

Note

When manually instantiating OmegaConfigLoader in code, runtime parameters passed through the CLI --params option will not be available to the resolver. This occurs because the manually created config loader instance doesn't have access to the runtime parameters provided through the CLI. If you need to access runtime parameters in code that manually instantiates OmegaConfigLoader, you should instead use the Kedro context to access parameters.

How to use `globals` and `runtime_params`¶

As mentioned above, runtime_params are not designed to override globals configuration. This is done to avoid unexplicit overrides and to simplify parameter resolutions. Thus, globals has a single entry point - the yaml file.

You can still use globals and runtime_params by specifying globals as a default value to be used in case the runtime parameter is not passed.

Consider this parameters.yml:

model_options:
  random_state: "${runtime_params:random, ${globals:my_global_value}}"

and this globals.yml file:

my_global_value: 4

This will allow you to pass a runtime parameter named random through the CLI to specify the value of model_options.random_state in your project's parameters:

kedro run --params random=3

If the random parameter is not passed through the CLI --params option with kedro run, then my_global_value from globals.yml is used for the model_options.random_state.

How to use resolvers in the `OmegaConfigLoader`¶

Instead of hard-coding values in your configuration files, you can also dynamically compute them using OmegaConf's resolvers functionality. You use resolvers to define custom logic to calculate values of parameters or catalog entries, or inject these values from elsewhere. To use this feature with Kedro, pass a dict of custom resolvers to OmegaConfigLoader through CONFIG_LOADER_ARGS in your project's src/<package_name>/settings.py. The example below illustrates this:

import polars as pl
from datetime import date


def date_today():
    return date.today()


CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "add": lambda *my_list: sum(my_list),
        "polars": lambda x: getattr(pl, x),
        "today": date_today,
    }
}

These custom resolvers are then registered using OmegaConf.register_new_resolver() internally and can be used in any of the configuration files in your project. For example, you can use the add or the today resolver defined above in your parameters.yml like this:

model_options:
  test_size: "${add:1,2,3}"
  random_state: 3

date: "${today:}"

The values of these parameters will be computed at access time and will be passed on to your nodes. Resolvers can also be used in your catalog.yml. In the example below, we use the polars resolver defined above to pass non-primitive types to the catalog entry.

my_polars_dataset:
  type: polars.CSVDataset
  filepath: data/01_raw/my_dataset.csv
  load_args:
    dtypes:
      product_age: "${polars:Float64}"
      group_identifier: "${polars:Utf8}"
    try_parse_dates: true

OmegaConf also comes with some built-in resolvers that you can use with the OmegaConfigLoader in Kedro. All built-in resolvers except for oc.env are enabled by default. oc.env is enabled solely for loading credentials. You can also turn this on for all configurations through your project's src/<package_name>/settings.py in a similar way:

Note

This is an advanced feature and should be used with caution. We do not recommend using environment variables for configurations other than credentials.

from omegaconf.resolvers import oc

CONFIG_LOADER_ARGS = {
    "custom_resolvers": {
        "oc.env": oc.env,
    }
}

How to load credentials through environment variables¶

The kedro.config.OmegaConfigLoader enables you to load credentials from environment variables. To achieve this you have to use the kedro.config.OmegaConfigLoader and the omegaconf oc.env resolver. You can use the oc.env resolver to access credentials from environment variables in your credentials.yml:

dev_s3:
  client_kwargs:
    aws_access_key_id: ${oc.env:AWS_ACCESS_KEY_ID}
    aws_secret_access_key: ${oc.env:AWS_SECRET_ACCESS_KEY}

Note

You can use this resolver solely in credentials.yml, not in catalog or parameter files. This restriction discourages using environment variables for anything other than credentials.

How to change the merge strategy used by `OmegaConfigLoader`¶

By default, OmegaConfigLoader merges configuration in different environments as well as runtime parameters in a destructive way. This means that whatever configuration resides in your overriding environment (local by default) takes precedence when the same top-level key is present in the base and overriding environment. Any configuration for that key besides that given in the overriding environment is discarded. The same behaviour applies to runtime parameters overriding any configuration in the base environment. You can change the merge strategy for each configuration type in your project's src/<package_name>/settings.py. The accepted merging strategies are soft and destructive.

from kedro.config import OmegaConfigLoader

CONFIG_LOADER_CLASS = OmegaConfigLoader

CONFIG_LOADER_ARGS = {
    "merge_strategy": {
        "parameters": "soft",
        "spark": "destructive",
        "mlflow": "soft",
    }
}

If no merge strategy is defined, the default destructive strategy will be applied. Note: this merge strategy setting applies when configuration files are located in different environments. When files are part of the same environment, they are always merged in a soft way. An error is thrown when files in the same environment contain the same top-level keys.

Advanced configuration without a full Kedro project¶

In some cases, you may want to use the OmegaConfigLoader without a Kedro project. By default, a Kedro project has a base and local environment. When you use the OmegaConfigLoader directly, it assumes no environment. You may find it useful to add Kedro to your existing notebooks.

Read configuration¶

The config loader can work without a Kedro project structure.

tree .
.
└── parameters.yml

Consider the following parameters.yml file and example Python script:

learning_rate: 0.01
train_test_ratio: 0.7

from kedro.config import OmegaConfigLoader
config_loader = OmegaConfigLoader(conf_source=".")

# Optionally, you can also use environments
# config_loader = OmegaConfigLoader(conf_source=".", base_env="base", default_run_env="local")

print(config_loader["parameters"])

If you run it from the same directory where parameters.yml placed it gives the following output:

{'learning_rate': 0.01, 'train_test_ratio': 0.7}

For the full list of features, see Configuration basics and Advanced configuration.

How to use custom resolvers with `OmegaConfigLoader`¶

You can register custom resolvers to use non-primitive types for parameters.

Consider the following parameters.yml file an example of Python script for registering a custom resolver:

polars_float64: "${polars: Float64}"
today: "${today:}"

import polars as pl
from datetime import date

from kedro.config import OmegaConfigLoader

custom_resolvers = {"polars": lambda x: getattr(pl, x),
                    "today": lambda: date.today()}

# Register custom resolvers
config_loader = OmegaConfigLoader(conf_source=".", custom_resolvers=custom_resolvers)

print(config_loader["parameters"])

If you run it from the same directory where parameters.yml placed it gives the following output:

{'polars_float64': Float64, 'today': datetime.date(2023, 11, 23)}

How to ignore hidden files and directories with `OmegaConfigLoader`¶

The OmegaConfigLoader provides an option to ignore hidden files and directories (those starting with a dot, for example, .hidden_file or .hidden_folder) when loading configuration files. This behaviour is controlled by the ignore_hidden parameter, which is set to True by default.

If you want to include hidden files and directories in your configuration loading process, you can set ignore_hidden to False when instantiating the OmegaConfigLoader:

from kedro.config import OmegaConfigLoader

conf_loader = OmegaConfigLoader(conf_source="conf", ignore_hidden=False)

Advanced configuration¶

Advanced configuration for Kedro projects¶

How to use a custom configuration loader¶

How to change which configuration files are loaded¶

How to ensure non default configuration files get loaded¶

How to bypass the configuration loading rules¶

How to do templating with the OmegaConfigLoader¶

Parameters¶

Catalog¶

Other configuration files¶

How to load a data catalog with templating in code?¶

How to use global variables with the OmegaConfigLoader¶

How to override configuration with runtime parameters with the OmegaConfigLoader¶

How to use globals and runtime_params¶

How to use resolvers in the OmegaConfigLoader¶

How to load credentials through environment variables¶

How to change the merge strategy used by OmegaConfigLoader¶

Advanced configuration without a full Kedro project¶

Read configuration¶

How to use custom resolvers with OmegaConfigLoader¶

How to ignore hidden files and directories with OmegaConfigLoader¶

How to do templating with the `OmegaConfigLoader`¶

How to use global variables with the `OmegaConfigLoader`¶

How to override configuration with runtime parameters with the `OmegaConfigLoader`¶

How to use `globals` and `runtime_params`¶

How to use resolvers in the `OmegaConfigLoader`¶

How to change the merge strategy used by `OmegaConfigLoader`¶

How to use custom resolvers with `OmegaConfigLoader`¶

How to ignore hidden files and directories with `OmegaConfigLoader`¶