Set up the spaceflights project¶
This section shows how to create a new project with
kedro new using the Kedro spaceflights starter) and install project dependencies (with
pip install -r requirements.txt).
Create a new project¶
Set up Kedro if you have not already done so.
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type
kedro -V in your terminal window.
Navigate to the folder you want to store the project. Type the following to generate the project from the Kedro spaceflights starter. The project will be populated with a complete set of working example code:
kedro new --starter=spaceflights-pandas
When prompted for a project name, you should accept the default choice (
Spaceflights) as the rest of this tutorial assumes that project name.
After Kedro has created the project, navigate to the project root directory:
Install project dependencies¶
Kedro projects have a
requirements.txt file to specify their dependencies and enable sharable projects by ensuring consistency across Python packages and versions.
The spaceflights project dependencies are stored in
requirements.txt(you may find that the versions differ slightly depending on the version of Kedro):
# code quality packages
ipython>=7.31.1, <8.0; python_version < '3.8'
ipython~=8.10; python_version >= '3.8'
# notebook tooling
# Pytest + useful extensions
# Kedro dependencies and datasets to work with different data formats (including CSV, Excel, and Parquet)
kedro-datasets[pandas.CSVDataset, pandas.ExcelDataset, pandas.ParquetDataset]>=1.1
kedro-viz~=6.0 # Visualise pipelines
# For modeling in the data science pipeline
Install the dependencies¶
To install all the project-specific dependencies, run the following from the project root directory:
pip install -r requirements.txt
Optional: logging and configuration¶
You might want to set up logging at this stage of the workflow, but we do not use it in this tutorial.
You may also want to store credentials such as usernames and passwords if they are needed for specific data sources used by the project.
To do this, add them to
conf/local/credentials.yml (some examples are included in that file for illustration).
Configuration best practice to avoid leaking confidential data¶
Do not commit data to version control.
Do not commit notebook output cells (data can easily sneak into notebooks when you don’t delete output cells).
Do not commit credentials in
conf/. Use only the
conf/local/folder for sensitive information like access credentials.
You can find additional information in the documentation on configuration.