Next steps: Tutorial¶
In this tutorial, we construct nodes and pipelines for a price-prediction model to illustrate the steps of a typical Kedro workflow.
The tutorial takes approximately 30 minutes to complete. You will work in the terminal and by inspecting project files in an IDE or text editor. There is no Jupyter notebook for the project.
It is 2160, and the space tourism industry is booming. Globally, thousands of space shuttle companies take tourists to the Moon and back. You have been able to source data that lists the amenities offered in each space shuttle, customer reviews, and company information.
Project: You want to construct a model that predicts the price for each trip to the Moon and the corresponding return flight.
Photo by Ivan Diaz on Unsplash
Get help¶
If you encounter an issue with the tutorial:
Check the spaceflights tutorial FAQ to see if we have answered the question already.
Use Kedro-Viz to visualise your project to better understand how the datasets, nodes and pipelines fit together.
Use the #questions channel on our Slack channel to ask the community for help.
Search the searchable archive of Slack discussions.
Terminology¶
We explain any Kedro-specific terminology as we introduce it, and further information can be found in the glossary. Some additional terminology may not be familiar to some readers, such as the concepts below.
Project root directory¶
Also known as the “root directory”, this is the parent folder for the entire project. It is the top-level folder that contains all other files and directories associated with the project.
Dependencies¶
These are Python packages or libraries that an individual project depends upon to complete a task. For example, the Spaceflights tutorial project depends on the scikit-learn library.
Standard development workflow¶
When you build a Kedro project, you will typically follow a standard development workflow:
Set up the project template
Create a new project and install project dependencies.
Configure credentials and any other sensitive/personal content, and logging
Set up the data
Add data to the
data
folderReference all datasets for the project
Create the pipeline
Construct nodes to make up the pipeline
Choose how to run the pipeline: sequentially or in parallel
Package the project
Build the project documentation
Package the project for distribution