Welcome to Kedro’s award-winning documentation!¶
- Project setup
- Extend Kedro
- Hooks
- Logging
- Default logging configuration
- PySpark integration
- Centralise Spark configuration in
conf/base/spark.yml
- Initialise a
SparkSession
using a hook - Use Kedro’s built-in Spark datasets to load and save raw data
- Spark and Delta Lake interaction
- Use
MemoryDataset
for intermediaryDataFrame
- Use
MemoryDataset
withcopy_mode="assign"
for non-DataFrame
Spark objects - Tips for maximising concurrency using
ThreadRunner
- Centralise Spark configuration in
- Development
- Deployment
API documentation¶
Kedro is a framework that makes it easy to build robust and scalable data pipelines by providing uniform project templates, data abstraction, configuration and pipeline assembly. |