
Welcome to Kedro’s documentation!¶
Introduction
First steps
Next steps: Tutorial
Visualisation with Kedro-Viz
- Get started with Kedro-Viz
- Visualise charts in Kedro-Viz
- Experiment tracking in Kedro-Viz
- Experiment tracking demonstration using Kedro-Viz
- Kedro versions supporting experiment tracking
- When should I use experiment tracking in Kedro?
- Set up a project
- Set up the session store
- Set up experiment tracking datasets
- Modify your nodes and pipelines to log metrics
- Generate the run data
- Access run data and compare runs
- View and compare plots
- View and compare metrics data
Notebooks & IPython users
Kedro project setup
Data Catalog
- The Data Catalog
- Use the Data Catalog within Kedro configuration
- Specify the location of the dataset
- Data Catalog
*_args
parameters - Use the Data Catalog with the YAML API
- Create a Data Catalog YAML configuration file via CLI
- Adding parameters
- Feeding in credentials
- Load multiple datasets with similar configuration
- Transcode datasets
- Version datasets and ML models
- Use the Data Catalog with the Code API
- Kedro IO
Nodes and pipelines
Extend Kedro
- Common use cases
- Custom datasets
- Scenario
- Project setup
- The anatomy of a dataset
- Implement the
_load
method withfsspec
- Implement the
_save
method withfsspec
- Implement the
_describe
method - The complete example
- Integration with
PartitionedDataSet
- Versioning
- Thread-safety
- How to handle credentials and different filesystems
- How to contribute a custom dataset implementation
- Kedro plugins
Logging
Development
Deployment
- Deployment guide
- Single-machine deployment
- Distributed deployment
- Deployment with Argo Workflows
- Deployment with Prefect
- Deployment with Kubeflow Pipelines
- Deployment with AWS Batch
- Deployment to a Databricks cluster
- How to integrate Amazon SageMaker into your Kedro pipeline
- How to deploy your Kedro pipeline with AWS Step Functions
- How to deploy your Kedro pipeline on Apache Airflow with Astronomer
- Deployment to a Dask cluster
PySpark integration
- Build a Kedro pipeline with PySpark
- Centralise Spark configuration in
conf/base/spark.yml
- Initialise a
SparkSession
using a hook - Use Kedro’s built-in Spark datasets to load and save raw data
- Spark and Delta Lake interaction
- Use
MemoryDataSet
for intermediaryDataFrame
- Use
MemoryDataSet
withcopy_mode="assign"
for non-DataFrame
Spark objects - Tips for maximising concurrency using
ThreadRunner
- Centralise Spark configuration in
Resources
API documentation¶
Kedro is a framework that makes it easy to build robust and scalable data pipelines by providing uniform project templates, data abstraction, configuration and pipeline assembly. |