Set up the spaceflights tutorial project¶
In this section, we discuss the project set-up phase, which is the first part of the standard development workflow. The setup steps are as follows:
Create a new project with
kedro new
Install project dependencies with
pip install -r src/requirements.txt
Configure the following in the
conf
folder:Credentials and any other sensitive information
Logging
Note
Don’t forget to check the tutorial FAQ if you run into problems, or ask the community for help if you need it!
Create a new project¶
If you have not yet set up Kedro, do so by following the guidelines to install Kedro.
Important
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.18.4).
In your terminal window, navigate to the folder you want to store the project and type the following to create an empty project:
kedro new
Alternatively, if you want to include a complete set of working example code within the project, generate the project from the Kedro starter for the spaceflights tutorial:
kedro new --starter=spaceflights
For either option, when prompted for a project name, enter Kedro Tutorial
. When Kedro has created your project, you can navigate to the project root directory:
cd kedro-tutorial
Project dependencies¶
Kedro projects have a requirements.txt
file to specify their dependencies and enable sharable projects by ensuring consistency across Python packages and versions.
The generic project template bundles some typical dependencies in src/requirements.txt
. Here’s a typical example, although you may find that the version numbers differ slightly depending on your version of Kedro:
# code quality packages
black==22.1.0 # Used for formatting code with `kedro lint`
flake8>=3.7.9, <5.0 # Used for linting code with `kedro lint`
ipython==7.0 # Used for an IPython session with `kedro ipython`
isort~=5.0 # Used for linting code with `kedro lint`
nbstripout~=0.4 # Strips the output of a Jupyter Notebook and writes the outputless version to the original file
# notebook tooling
jupyter~=1.0 # Used to open a Kedro-session in Jupyter Notebook & Lab
jupyterlab~=3.0 # Used to open a Kedro-session in Jupyter Lab
# Pytest + useful extensions
pytest-cov~=3.0 # Produces test coverage reports
pytest-mock>=1.7.1, <2.0 # Wrapper around the mock package for easier use with pytest
pytest~=6.2 # Testing framework for Python code
You can learn more about project dependencies in the Kedro documentation.
Add dependencies to the project¶
The dependencies above might be sufficient for some projects, but for this tutorial, you must add some extra requirements. These requirements will enable us to work with different data formats (including CSV, Excel, and Parquet) and to visualise the pipeline.
If you are using the tutorial created by the spaceflights starter, you can omit the copy/paste, but it’s worth opening
src/requirements.txt
to inspect the contents.
Add the following lines to your src/requirements.txt
file:
kedro[pandas.CSVDataSet, pandas.ExcelDataSet, pandas.ParquetDataSet]==0.18.4 # Specify optional Kedro dependencies
kedro-viz~=5.0 # Visualise your pipelines
scikit-learn~=1.0 # For modelling in the data science pipeline
Install the dependencies¶
To install all the project-specific dependencies, run the following from the project root directory:
pip install -r src/requirements.txt
Optional: configuration and logging¶
You may want to store credentials such as usernames and passwords if they are needed for specific data sources used by the project.
To do this, add them to conf/local/credentials.yml
(some examples are included in that file for illustration).
You can find additional information in the advanced documentation on configuration.
You might also want to set up logging at this stage of the workflow, but we do not use it in this tutorial.