Welcome to Kedro’s award-winning documentation!¶
- PySpark integration
- Centralise Spark configuration in
conf/base/spark.yml
- Initialise a
SparkSession
using a hook - Use Kedro’s built-in Spark datasets to load and save raw data
- Spark and Delta Lake interaction
- Use
MemoryDataset
for intermediaryDataFrame
- Use
MemoryDataset
withcopy_mode="assign"
for non-DataFrame
Spark objects - Tips for maximising concurrency using
ThreadRunner
- Centralise Spark configuration in
- How to add MLflow to your Kedro workflow
- Project setup
- Extend Kedro
- Hooks
- Logging
- Default logging configuration
- How to perform logging in your Kedro project
- How to customise Kedro logging
- Custom
CONF_SOURCE
with logging - Advanced logging
- How to customise the
rich
handler - How to enable file-based logging
- How to use plain console logging
- How to enable rich logging in a dumb terminal
- How to enable rich logging in Jupyter
- Development
- Deployment
API documentation¶
Kedro is a framework that makes it easy to build robust and scalable data pipelines by providing uniform project templates, data abstraction, configuration and pipeline assembly. |