TL;DR

This page summarises what you’ve learned about Kedro so far.

Logistics

  • Kedro can be used on Windows, macOS or Linux

  • Installation prerequisites include Python 3.7+, git and conda

  • You should install Kedro using pip install kedro or conda install -c conda-forge kedro

Kedro concepts

  • Kedro nodes are the building blocks of pipelines. A node is a wrapper for a Python function that names the inputs and outputs of that function.

  • A pipeline organises the dependencies and execution order of a collection of nodes.

  • Kedro has a registry of all data sources the project can use called the Data Catalog. There is inbuilt support for various file types and file systems.

  • Kedro projects follow a default template that uses specific folders to store datasets, notebooks, configuration and source code.

Kedro project creation

  • You can create a Kedro project:

    • with just the basic code: kedro new

    • or you can populate a new project with pre-built code, e.g. kedro new --starter=pandas-iris from a range of starter projects

  • Once you’ve created a project, you need to navigate to its project folder; you can then install its dependencies: pip install -r src/requirements.txt

  • To run the project: kedro run

  • To visualise the project: kedro viz

What’s next?