Advanced configuration¶
The documentation on configuration describes how to satisfy most common requirements of standard Kedro project configuration:
By default, Kedro is set up to use the ConfigLoader class. Kedro also provides two additional configuration loaders with more advanced functionality: the TemplatedConfigLoader and the OmegaConfigLoader.
Each of these classes are alternatives for the default ConfigLoader
and have different features. The following sections describe each of these classes and their specific functionality in more detail.
TemplatedConfigLoader¶
Kedro provides an extension TemplatedConfigLoader class that allows you to template values in configuration files. To apply templating in your project, set the CONFIG_LOADER_CLASS
constant in your src/<package_name>/settings.py
:
from kedro.config import TemplatedConfigLoader # new import
CONFIG_LOADER_CLASS = TemplatedConfigLoader
Provide template values through globals¶
When using the TemplatedConfigLoader
you can provide values in the configuration template through a globals
file or dictionary.
Let’s assume the project contains a conf/base/globals.yml
file with the following contents:
bucket_name: "my_s3_bucket"
key_prefix: "my/key/prefix/"
datasets:
csv: "pandas.CSVDataSet"
spark: "spark.SparkDataSet"
folders:
raw: "01_raw"
int: "02_intermediate"
pri: "03_primary"
fea: "04_feature"
To point your TemplatedConfigLoader
to the globals file, add it to the CONFIG_LOADER_ARGS
variable in src/<package_name>/settings.py
:
CONFIG_LOADER_ARGS = {"globals_pattern": "*globals.yml"}
Now the templating can be applied to the configuration. Here is an example of a templated conf/base/catalog.yml
file:
raw_boat_data:
type: "${datasets.spark}" # nested paths into global dict are allowed
filepath: "s3a://${bucket_name}/${key_prefix}/${folders.raw}/boats.csv"
file_format: parquet
raw_car_data:
type: "${datasets.csv}"
filepath: "s3://${bucket_name}/data/${key_prefix}/${folders.raw}/${filename|cars.csv}" # default to 'cars.csv' if the 'filename' key is not found in the global dict
Under the hood, TemplatedConfigLoader
uses JMESPath
syntax to extract elements from the globals dictionary.
Alternatively, you can declare which values to fill in the template through a dictionary. This dictionary could look like the following:
{
"bucket_name": "another_bucket_name",
"non_string_key": 10,
"key_prefix": "my/key/prefix",
"datasets": {"csv": "pandas.CSVDataSet", "spark": "spark.SparkDataSet"},
"folders": {
"raw": "01_raw",
"int": "02_intermediate",
"pri": "03_primary",
"fea": "04_feature",
},
}
To point your TemplatedConfigLoader
to the globals dictionary, add it to the CONFIG_LOADER_ARGS
variable in src/<package_name>/settings.py
:
CONFIG_LOADER_ARGS = {
"globals_dict": {
"bucket_name": "another_bucket_name",
"non_string_key": 10,
"key_prefix": "my/key/prefix",
"datasets": {"csv": "pandas.CSVDataSet", "spark": "spark.SparkDataSet"},
"folders": {
"raw": "01_raw",
"int": "02_intermediate",
"pri": "03_primary",
"fea": "04_feature",
},
}
}
If you specify both globals_pattern
and globals_dict
in CONFIG_LOADER_ARGS
, the contents of the dictionary resulting from globals_pattern
are merged with the globals_dict
dictionary. In case of conflicts, the keys from the globals_dict
dictionary take precedence.
OmegaConfigLoader¶
OmegaConf is a Python library designed for configuration. It is a YAML-based hierarchical configuration system with support for merging configurations from multiple sources.
From Kedro 0.18.5 you can use the OmegaConfigLoader
which uses OmegaConf
under the hood to load data.
Note
OmegaConfigLoader
is under active development. It was first available from Kedro 0.18.5 with additional features due in later releases. Let us know if you have any feedback about the OmegaConfigLoader
.
OmegaConfigLoader
can load YAML
and JSON
files. Acceptable file extensions are .yml
, .yaml
, and .json
. By default, any configuration files used by the config loaders in Kedro are .yml
files.
To use OmegaConfigLoader
in your project, set the CONFIG_LOADER_CLASS
constant in your src/<package_name>/settings.py
:
from kedro.config import OmegaConfigLoader # new import
CONFIG_LOADER_CLASS = OmegaConfigLoader
Advanced Kedro configuration¶
This section contains a set of guidance for advanced configuration requirements of standard Kedro projects:
How to change which configuration files are loaded¶
If you want to change the patterns that the configuration loader uses to find the files to load you need to set the CONFIG_LOADER_ARGS
variable in src/<package_name>/settings.py
.
For example, if your parameters
files are using a params
naming convention instead of parameters
(e.g. params.yml
) you need to update CONFIG_LOADER_ARGS
as follows:
CONFIG_LOADER_ARGS = {
"config_patterns": {
"parameters": ["params*", "params*/**", "**/params*"],
}
}
By changing this setting, the default behaviour for loading parameters will be replaced, while the other configuration patterns will remain in their default state.
How to ensure non default configuration files get loaded¶
You can add configuration patterns to match files other than parameters
, credentials
, logging
, and catalog
by setting the CONFIG_LOADER_ARGS
variable in src/<package_name>/settings.py
.
For example, if you want to load Spark configuration files you need to update CONFIG_LOADER_ARGS
as follows:
CONFIG_LOADER_ARGS = {
"config_patterns": {
"spark": ["spark*/"],
}
}
How to bypass the configuration loading rules¶
You can bypass the configuration patterns and set configuration directly on the instance of a config loader class. You can bypass the default configuration (catalog, parameters, credentials, and logging) as well as additional configuration.
10from kedro.config import ConfigLoader
11from kedro.framework.project import settings
12
13conf_path = str(project_path / settings.CONF_SOURCE)
14conf_loader = ConfigLoader(conf_source=conf_path)
15
16# Bypass configuration patterns by setting the key and values directly on the config loader instance.
17conf_loader["catalog"] = {"catalog_config": "something_new"}
How to use Jinja2 syntax in configuration¶
From version 0.17.0, TemplatedConfigLoader
also supports the Jinja2 template engine alongside the original template syntax. Below is an example of a catalog.yml
file that uses both features:
{% for speed in ['fast', 'slow'] %}
{{ speed }}-trains:
type: MemoryDataset
{{ speed }}-cars:
type: pandas.CSVDataSet
filepath: s3://${bucket_name}/{{ speed }}-cars.csv
save_args:
index: true
{% endfor %}
When parsing this configuration file, TemplatedConfigLoader
will:
Read the
catalog.yml
and compile it using Jinja2Use a YAML parser to parse the compiled config into a Python dictionary
Expand
${bucket_name}
infilepath
using theglobals_pattern
andglobals_dict
arguments for theTemplatedConfigLoader
instance, as in the previous examples
The output Python dictionary will look as follows:
{
"fast-trains": {"type": "MemoryDataset"},
"fast-cars": {
"type": "pandas.CSVDataSet",
"filepath": "s3://my_s3_bucket/fast-cars.csv",
"save_args": {"index": True},
},
"slow-trains": {"type": "MemoryDataset"},
"slow-cars": {
"type": "pandas.CSVDataSet",
"filepath": "s3://my_s3_bucket/slow-cars.csv",
"save_args": {"index": True},
},
}
Warning
Although Jinja2 is a very powerful and extremely flexible template engine, which comes with a wide range of features, we do not recommend using it to template your configuration unless absolutely necessary. The flexibility of dynamic configuration comes at a cost of significantly reduced readability and much higher maintenance overhead. We believe that, for the majority of analytics projects, dynamically compiled configuration does more harm than good.
How to do templating with the OmegaConfigLoader
¶
Parameters¶
Templating or variable interpolation, as it’s called in OmegaConf
, for parameters works out of the box if the template values are within the parameter files or the name of the file that contains the template values follows the same config pattern specified for parameters.
By default, the config pattern for parameters is: ["parameters*", "parameters*/**", "**/parameters*"]
.
Suppose you have one parameters file called parameters.yml
containing parameters with omegaconf
placeholders like this:
model_options:
test_size: ${data.size}
random_state: 3
and a file containing the template values called parameters_globals.yml
:
data:
size: 0.2
Since both of the file names (parameters.yml
and parameters_globals.yml
) match the config pattern for parameters, the OmegaConfigLoader
will load the files and resolve the placeholders correctly.
Catalog¶
From Kedro 0.18.10
templating also works for catalog files. To enable templating in the catalog you need to ensure that the template values are within the catalog files or the name of the file that contains the template values follows the same config pattern specified for catalogs.
By default, the config pattern for catalogs is: ["catalog*", "catalog*/**", "**/catalog*"]
.
Additionally, any template values in the catalog need to start with an underscore _
. This is because of how catalog entries are validated. Templated values will neither trigger a key duplication error nor appear in the resulting configuration dictionary.
Suppose you have one catalog file called catalog.yml
containing entries with omegaconf
placeholders like this:
companies:
type: ${_pandas.type}
filepath: data/01_raw/companies.csv
and a file containing the template values called catalog_globals.yml
:
_pandas:
type: pandas.CSVDataSet
Since both of the file names (catalog.yml
and catalog_globals.yml
) match the config pattern for catalogs, the OmegaConfigLoader
will load the files and resolve the placeholders correctly.
Other configuration files¶
It’s also possible to use variable interpolation in configuration files other than parameters and catalog, such as custom spark or mlflow configuration. This works in the same way as variable interpolation in parameter files. You can still use the underscore for the templated values if you want, but it’s not mandatory like it is for catalog files.
How to use global variables with the OmegaConfigLoader
¶
From Kedro 0.18.13
, you can use variable interpolation in your configurations using “globals” with OmegaConfigLoader
.
The benefit of using globals over regular variable interpolation is that the global variables are shared across different configuration types, such as catalog and parameters.
By default, these global variables are assumed to be in files called globals.yml
in any of your environments. If you want to configure the naming patterns for the files that contain your global variables,
you can do so by overwriting the globals
key in config_patterns
. You can also bypass the configuration loading
to directly set the global variables in OmegaConfigLoader
.
Suppose you have global variables located in the file conf/base/globals.yml
:
my_global_value: 45
dataset_type:
csv: pandas.CSVDataSet
You can access these global variables in your catalog or parameters config files with a globals
resolver like this:
conf/base/parameters.yml
:
my_param : "${globals:my_global_value}"
conf/base/catalog.yml
:
companies:
filepath: data/01_raw/companies.csv
type: "${globals:dataset_type.csv}"
You can also provide a default value to be used in case the global variable does not exist:
my_param: "${globals: nonexistent_global, 23}"
If there are duplicate keys in the globals files in your base and run time environments, the values in the run time environment will overwrite the values in your base environment.
How to use resolvers in the OmegaConfigLoader
¶
Instead of hard-coding values in your configuration files, you can also dynamically compute them using OmegaConf
’s
resolvers functionality. You use resolvers to define custom
logic to calculate values of parameters or catalog entries, or inject these values from elsewhere. To use this feature with Kedro, pass a
dict
of custom resolvers to OmegaConfigLoader
through CONFIG_LOADER_ARGS
in your project’s src/<package_name>/settings.py
.
The example below illustrates this:
import polars as pl
from datetime import date
from kedro.config import OmegaConfigLoader
CONFIG_LOADER_CLASS = OmegaConfigLoader
def date_today():
return date.today()
CONFIG_LOADER_ARGS = {
"custom_resolvers": {
"add": lambda *my_list: sum(my_list),
"polars": lambda x: getattr(pl, x),
"today": lambda: date_today(),
}
}
These custom resolvers are then registered using OmegaConf.register_new_resolver()
under the hood and can be used in any of the
configuration files in your project. For example, you can use the add
or the today
resolver defined above in your parameters.yml
like this:
model_options:
test_size: "${add:1,2,3}"
random_state: 3
date: "${today:}"
The values of these parameters will be computed at access time and will be passed on to your nodes.
Resolvers can also be used in your catalog.yml
. In the example below, we use the polars
resolver defined above to pass non-primitive
types to the catalog entry.
my_polars_dataset:
type: polars.CSVDataSet
filepath: data/01_raw/my_dataset.csv
load_args:
dtypes:
product_age: "${polars:Float64}"
group_identifier: "${polars:Utf8}"
try_parse_dates: true
OmegaConf
also comes with some built-in resolvers
that you can use with the OmegaConfigLoader
in Kedro. All built-in resolvers except for oc.env
are enabled by default. oc.env
is only turned on for loading credentials. You can, however, turn this on for all configurations through your project’s src/<package_name>/settings.py
in a similar way:
Note
This is an advanced feature and should be used with caution. We do not recommend using environment variables for configurations other than credentials.
from omegaconf.resolvers import oc
from kedro.config import OmegaConfigLoader
CONFIG_LOADER_CLASS = OmegaConfigLoader
CONFIG_LOADER_ARGS = {
"custom_resolvers": {
"oc.env": oc.env,
}
}
How to load credentials through environment variables¶
The OmegaConfigLoader
enables you to load credentials from environment variables. To achieve this you have to use the OmegaConfigLoader
and the omegaconf
oc.env
resolver.
To use the OmegaConfigLoader
in your project, set the CONFIG_LOADER_CLASS
constant in your src/<package_name>/settings.py
:
from kedro.config import OmegaConfigLoader # new import
CONFIG_LOADER_CLASS = OmegaConfigLoader
Now you can use the oc.env
resolver to access credentials from environment variables in your credentials.yml
, as demonstrated in the following example:
dev_s3:
client_kwargs:
aws_access_key_id: ${oc.env:AWS_ACCESS_KEY_ID}
aws_secret_access_key: ${oc.env:AWS_SECRET_ACCESS_KEY}
Note
Note that you can only use the resolver in credentials.yml
and not in catalog or parameter files. This is because we do not encourage the usage of environment variables for anything other than credentials.