
Welcome to Kedro’s award-winning documentation!¶
Learn about Kedro
Tutorial and basic Kedro usage
Kedro projects
Integrations
- PySpark integration
- Centralise Spark configuration in
conf/base/spark.yml
- Initialise a
SparkSession
using a hook - Use Kedro’s built-in Spark datasets to load and save raw data
- Spark and Delta Lake interaction
- Use
MemoryDataset
for intermediaryDataFrame
- Use
MemoryDataset
withcopy_mode="assign"
for non-DataFrame
Spark objects - Tips for maximising concurrency using
ThreadRunner
- Centralise Spark configuration in
- How to add MLflow to your Kedro workflow
- Data and pipeline versioning with Kedro and DVC
- Data versioning with Delta Lake
- Data versioning with Iceberg
Development
Advanced usage
- Project setup
- Extend Kedro
- Hooks
- Logging
- Default logging configuration
- How to perform logging in your Kedro project
- How to customise Kedro logging
- Custom
CONF_SOURCE
with logging - Advanced logging
- How to customise the
rich
handler - How to enable file-based logging
- How to use plain console logging
- How to enable rich logging in a dumb terminal
- How to enable rich logging in Jupyter
- Deployment
Contribute to Kedro
API documentation¶
Kedro is a framework that makes it easy to build robust and scalable data pipelines by providing uniform project templates, data abstraction, configuration and pipeline assembly. |