Skip to content

Kedro architecture overview

There are different ways to use Kedro in your work. You can:

  • Commit to using Kedro end to end (framework, project, starters and library); this approach helps you gain the full value proposition of Kedro
  • Use selected Kedro components, such as the DataCatalog (I/O), OmegaConfigLoader, pipelines and runners, by importing the library in an existing codebase; this best supports a workflow where you do not want to adopt the Kedro project template
  • Or develop extensions for Kedro, for example, custom starters, plugins, hooks and more

At a high level, Kedro consists of five main parts:

Kedro architecture diagram

Kedro project

As a data pipeline developer, you will interact with a Kedro project, which consists of:

  • The conf/ directory, which contains configuration for the project, such as data catalog configuration and parameters.

  • The src directory, which contains the source code for the project, including:

    • The pipelines directory, which contains the source code for your pipelines.
    • settings.py file contains the settings for the project, such as library component registration and custom hook registration. All the available settings are listed and explained in the project settings chapter.
    • pipeline_registry.py file defines the project pipelines that can be run using kedro run --pipeline.
    • __main__.py file serves as the main entry point of the project in package mode.
  • pyproject.toml identifies the project root by providing project metadata, including:

    • package_name: A valid Python package name for your project package.
    • project_name: A human readable name for your project.
    • kedro_init_version: Kedro version with which the project was generated.

Kedro framework

Kedro framework serves as the interface between a Kedro project and Kedro library components. The major building blocks of the Kedro framework include:

Kedro starter

You can use a Kedro starter to generate a Kedro project that contains boilerplate code. We maintain a set of official starters but you can also use a custom starter of your choice.

Kedro library

Kedro library consists of independent units, each responsible for one aspect of computation in a data pipeline:

  • kedro.config.OmegaConfigLoader provides utility to parse and load configuration defined in a Kedro project.
  • kedro.pipeline provides a collection of abstractions to model data pipelines.
  • kedro.runner provides an abstraction for different execution strategies of a data pipeline.
  • kedro.io provides a collection of abstractions to handle I/O in a project, including DataCatalog and several Dataset implementations.

Kedro extension

You can also extend Kedro behaviour in your project using a Kedro extension. An extension can be a custom starter, a Python library with extra hook implementations, extra CLI commands such as Kedro-Viz, or a custom library component implementation.

If you create a Kedro extension, we welcome all kinds of contributions. Check out our guide to contributing to Kedro. Dataset contributions to kedro-datasets are the most frequently accepted, since they do not require any changes to the framework itself. We also welcome contributions to any of the other kedro-plugins.