Kedro’s command line interface¶

Kedro’s command line interface (CLI) is used to give commands to Kedro via a terminal shell (such as the terminal app on macOS, or cmd.exe or PowerShell on Windows). You need to use the CLI to set up a new Kedro project, and to run it.

Autocompletion (optional)¶

If you are using macOS or Linux, you can set up your shell to autocomplete kedro commands. If you don’t know the type of shell you are using, first type the following:

echo $0

If you are using Bash (click to expand)

Add the following to your ~/.bashrc (or just run it on the command line):

eval "$(_KEDRO_COMPLETE=source kedro)"

If you are using Z shell (ZSh) (click to expand)

Add the following to ~/.zshrc:

eval "$(_KEDRO_COMPLETE=source_zsh kedro)"

If you are using Fish (click to expand)

Add the following to ~/.config/fish/completions/foo-bar.fish:

eval (env _KEDRO_COMPLETE=source_fish kedro)

Invoke Kedro CLI from Python (optional)¶

You can invoke the Kedro CLI as a Python module:

python -m kedro

Kedro commands¶

Here is a list of Kedro CLI commands, as a shortcut to the descriptions below. Project-specific commands are called from within a project directory and apply to that particular project. Global commands can be run anywhere and don’t apply to any particular project:

Global Kedro commands¶

The following are Kedro commands that apply globally and can be run from any directory location.

Note

You only need to use one of those given below (e.g. specify kedro -V OR kedro --version).

Get help on Kedro commands¶

kedro
kedro -h
kedro --help

Confirm the Kedro version¶

kedro -V
kedro --version

Confirm Kedro information¶

kedro info

Returns output similar to the following, depending on the version of Kedro used and plugins installed.

 _            _
| | _____  __| |_ __ ___
| |/ / _ \/ _` | '__/ _ \
|   <  __/ (_| | | | (_) |
|_|\_\___|\__,_|_|  \___/
v0.17.7

Kedro is a Python framework for
creating reproducible, maintainable
and modular data science code.

Installed plugins:
kedro_viz: 3.4.0 (hooks:global,line_magic)

Create a new Kedro project¶

kedro new

Open the Kedro documentation in your browser¶

kedro docs

Project-specific Kedro commands¶

Note

All project related CLI commands should be run from the project’s root directory.

Kedro’s command line interface (CLI) allows you to associate a set of commands and dependencies with a target, which you can then execute from inside the project directory.

The commands a project supports are specified in its cli.py file, which can be extended, either by modifying the file or by injecting commands into it via the plugin framework.

Project setup¶

Build the project’s dependency tree¶

kedro build-reqs

This command runs pip-compile on the project’s src/requirements.in file. If the file doesn’t exist, Kedro will create it by copying from src/requirements.txt.

kedro build-reqs also accepts and passes through CLI options accepted by pip-compile. For example, kedro build-reqs --generate-hashes will call pip-compile --generate-hashes src/requirements.in.

Install all package dependencies¶

The following runs pip to install all package dependencies specified in src/requirements.txt:

kedro install

For further information, see the kedro install documentation.

Run the project¶

Call the run() method of the KedroSession defined in kedro.framework.session.

kedro run

KedroContext can be extended in run.py (src/project-name/run.py). In order to use the extended KedroContext you need to set context_path in pyproject.toml configuration file.

Modifying a `kedro run`¶

Kedro has options to modify pipeline runs. Here is a list of CLI arguments supported out of the box:

CLI command	Description	Multiple instances allowed?
`kedro run --from-inputs dataset1,dataset2`	A list of dataset names which should be used as a starting point	No
`kedro run --from-nodes node1,node2`	A list of node names which should be used as a starting point	No
`kedro run --to-nodes node3,node4`	A list of node names which should be used as an end point	No
`kedro run --node debug_me,debug_me_too`	Run only nodes with specified names	Yes
`kedro run --runner runner_name`	Run the pipeline with a specific runner. Cannot be used together with `--parallel`	No
`kedro run --parallel`	Run the pipeline using the `ParallelRunner`. If not specified, use the `SequentialRunner`. Cannot be used together with `--runner`	No
`kedro run --env env_name`	Run the pipeline in the env_name environment. Defaults to local if not provided	No
`kedro run --tag some_tag1,some_tag2`	Run only nodes which have any of these tags attached	Yes
`kedro run --load-version="some_dataset:YYYY-MM-DDThh.mm.ss.sssZ"`	Specify a particular dataset version (timestamp) for loading	Yes
`kedro run --pipeline de`	Run the whole pipeline by its name	No
`kedro run --config config.yml`	Specify all command line options in a configuration file called config.yml	No
`kedro run --params param_key1:value1,param_key2:2.0`	Does a parametrised kedro run with `{"param_key1": "value1", "param_key2": 2}`, these will take precedence over parameters defined in the `conf` directory. Additionally, dot (`.`) syntax can be used to address nested keys like follows: `parent.child:value`.	Yes

You can also combine these options together, so the following command runs all the nodes from split to predict and report:

kedro run --from-nodes split --to-nodes predict,report

This functionality is extended to the kedro run --config config.yml command, which allows you to specify run commands in a configuration file.

A parameterised run is best used for dynamic parameters, i.e. running the same pipeline with different inputs, for static parameters that do not change we recommend following the Kedro project setup methodology.

Deploy the project¶

The following packages your application as one .egg file and one .whl file within the src/dist/ folder of your project:

kedro package

See the Python documentation for further information about packaging.

Pull a micro-package¶

Since Kedro 0.17.7 you can pull a micro-package into your Kedro project as follows:

kedro micropkg pull <link-to-micro-package-wheel-file>

The above command will take the bundled .whl file and do the following:

Place source code in src/<package_name>/pipelines/<pipeline_name>
Place parameters in conf/base/parameters/<pipeline_name>.yml
Pull out tests and place in src/tests/pipelines/<pipeline_name>

kedro micropkg pull works with PyPI, local and cloud storage:

PyPI: kedro micropkg pull <my-pipeline> with <my-pipeline> being a package on PyPI
Local storage: kedro micropkg pull <path-to-your-project-root>/src/dist/<my-pipeline>-0.1-py3-none-any.whl
Cloud storage: kedro micropkg pull s3://<my-bucket>/<my-pipeline>-0.1-py3-none-any.whl

Project quality¶

Build the project documentation¶

kedro build-docs

The build-docs command builds project documentation using the Sphinx framework. To further customise your documentation, please refer to docs/source/conf.py and the Sphinx documentation.

Lint your project¶

kedro lint

Your project is linted with black, flake8 and isort. See our documentation about kedro lint for further details.

Test your project¶

The following runs all pytest unit tests found in src/tests, including coverage (see the file .coveragerc):

kedro test

Project development¶

Modular pipelines¶

Create a new modular pipeline in your project¶

kedro pipeline create <pipeline_name>

Package a micro-package¶

The following command packages all the files related to a modular pipeline into a wheel file:

kedro micropkg package <pipeline_name>

Further information is available in the pipeline documentation.

Pull a micro-package in your project¶

The following command pulls all the files related to a modular pipeline from either Pypi or a storage location of a wheel file.

kedro micropkg pull <package_name> (or path to a wheel file)

Further information is available in the micro-packaging documentation.

Delete a modular pipeline¶

The following command deletes all the files related to a modular pipeline in your Kedro project.

kedro pipeline delete <pipeline_name>

Further information is available in the micro-packaging documentation.

Describe a pipeline¶

kedro pipeline describe <pipeline_name>

The output includes all the nodes in the pipeline. If no pipeline name is provided, this command returns all nodes in the __default__ pipeline.

List all pipelines in your project¶

kedro pipeline list

Datasets¶

List datasets per pipeline per type¶

kedro catalog list

The results include datasets that are/aren’t used by a specific pipeline.

The command also accepts an optional --pipeline argument that allows you to specify the pipeline name(s) (comma-separated values) in order to filter datasets used only by those named pipeline(s). For example:

kedro catalog list --pipeline "ds,de"

Data Catalog¶

Create a Data Catalog YAML configuration file¶

The following command creates a Data Catalog YAML configuration file with MemoryDataSet datasets for each dataset in a registered pipeline, if it is missing from the DataCatalog.

kedro catalog create --pipeline <pipeline_name>

The command also accepts an optional --env argument that allows you to specify a configuration environment (defaults to base).

The command creates the following file: <conf_root>/<env>/catalog/<pipeline_name>.yml

Notebooks¶

To start a Jupyter Notebook:

kedro jupyter notebook

To start JupyterLab:

kedro jupyter lab

To start an IPython shell:

kedro ipython

Every time you start or restart a notebook kernel, a startup script (<project-root>/.ipython/profile_default/startup/00-kedro-init.py) will add the following variables in scope:

context: An instance of kedro.framework.context.KedroContext class or custom context class extending KedroContext if one was set to CONTEXT_CLASS in settings.py file (further details of how to use context can be found in the IPython documentation)
startup_error (Exception)
catalog

To reload these variables at any point in your notebook (e.g. if you updated catalog.yml) use the line magic %reload_kedro, which can be also used to see the error message if any of the variables above are undefined.

If you get an error message Module ``<module_name>`` not found. Make sure to install required project dependencies by running ``kedro install`` command first. when running any of those commands, it indicates that some Jupyter or IPython dependencies are not installed in your environment. To resolve this you will need to do the following:

Make sure the corresponding dependency is present in src/requirements.in (src/requirements.txt if not compiled)
Run kedro install command from your terminal

Copy tagged cells¶

To copy the code from cells tagged with node tag into Python files under src/<package_name>/nodes/ in a Kedro project:

kedro jupyter convert --all

Strip output cells¶

Output cells of Jupyter Notebook should not be tracked by git, especially if they contain sensitive information. To strip them out:

kedro activate-nbstripout

This command adds a git hook which clears all notebook output cells before committing anything to git. It needs to run only once per local repository.