Set up the spaceflights tutorial project

In this section, we discuss the project set-up phase, which is the first part of the standard development workflow. The setup steps are as follows:

  • Create a new project with kedro new

  • Install project dependencies with pip install -r src/requirements.txt

  • Configure the following in the conf folder:

    • Credentials and any other sensitive information

    • Logging

Note

Don’t forget to check the tutorial FAQ if you run into problems, or ask the community for help if you need it!

Create a new project

If you have not yet set up Kedro, do so by following the guidelines to install Kedro.

Important

We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.18.4).

In your terminal window, navigate to the folder you want to store the project and type the following to create an empty project:

kedro new

Alternatively, if you want to include a complete set of working example code within the project, generate the project from the Kedro starter for the spaceflights tutorial:

kedro new --starter=spaceflights

For either option, when prompted for a project name, enter Kedro Tutorial. When Kedro has created your project, you can navigate to the project root directory:

cd kedro-tutorial

Project dependencies

Kedro projects have a requirements.txt file to specify their dependencies and enable sharable projects by ensuring consistency across Python packages and versions.

The generic project template bundles some typical dependencies in src/requirements.txt. Here’s a typical example, although you may find that the version numbers differ slightly depending on your version of Kedro:

# code quality packages
black==22.1.0 # Used for formatting code with `kedro lint`
flake8>=3.7.9, <5.0 # Used for linting code with `kedro lint`
ipython==7.0 # Used for an IPython session with `kedro ipython`
isort~=5.0 # Used for linting code with `kedro lint`
nbstripout~=0.4 # Strips the output of a Jupyter Notebook and writes the outputless version to the original file

# notebook tooling
jupyter~=1.0 # Used to open a Kedro-session in Jupyter Notebook & Lab
jupyterlab~=3.0 # Used to open a Kedro-session in Jupyter Lab

# Pytest + useful extensions
pytest-cov~=3.0 # Produces test coverage reports
pytest-mock>=1.7.1, <2.0 # Wrapper around the mock package for easier use with pytest
pytest~=6.2 # Testing framework for Python code

You can learn more about project dependencies in the Kedro documentation.

Add dependencies to the project

The dependencies above might be sufficient for some projects, but for this tutorial, you must add some extra requirements. These requirements will enable us to work with different data formats (including CSV, Excel, and Parquet) and to visualise the pipeline.

If you are using the tutorial created by the spaceflights starter, you can omit the copy/paste, but it’s worth opening src/requirements.txt to inspect the contents.

Add the following lines to your src/requirements.txt file:

kedro[pandas.CSVDataSet, pandas.ExcelDataSet, pandas.ParquetDataSet]==0.18.4   # Specify optional Kedro dependencies
kedro-viz~=5.0                                                                 # Visualise your pipelines
scikit-learn~=1.0                                                              # For modelling in the data science pipeline

Install the dependencies

To install all the project-specific dependencies, run the following from the project root directory:

pip install -r src/requirements.txt

Optional: configuration and logging

You may want to store credentials such as usernames and passwords if they are needed for specific data sources used by the project.

To do this, add them to conf/local/credentials.yml (some examples are included in that file for illustration).

You can find additional information in the advanced documentation on configuration.

You might also want to set up logging at this stage of the workflow, but we do not use it in this tutorial.