Skip to content

kedro_datasets

kedro_datasets

kedro_datasets is where you can find all of Kedro's data connectors.

KedroDeprecationWarning

Bases: DeprecationWarning

Custom class for warnings about deprecated Kedro features.

Dataset Classes

Name Description
api.APIDataset APIDataset loads/saves data from/to HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/latest/
biosequence.BioSequenceDataset BioSequenceDataset loads and saves data to a sequence file.
dask.CSVDataset CSVDataset loads and saves data to comma-separated value file(s). It uses Dask remote data services to handle the corresponding load and save operations.
dask.ParquetDataset ParquetDataset loads and saves data to parquet file(s). It uses Dask remote data services to handle the corresponding load and save operations.
databricks.ManagedTableDataset ManagedTableDataset loads and saves data into managed delta tables in Databricks.
email.EmailMessageDataset EmailMessageDataset loads/saves an email message from/to a file using an underlying filesystem (e.g.: local, S3, GCS). It uses the email package in the standard library to manage email messages.
geopandas.GenericDataset GenericDataset loads/saves data to a file using an underlying filesystem (eg: local, S3, GCS). The underlying functionality is supported by geopandas, so it supports all allowed geopandas (pandas) options for loading and saving files. fiona (a dependency of geopandas) does not currently support Python 3.14.
holoviews.HoloviewsWriter HoloviewsWriter saves Holoviews objects to image file(s) in an underlying filesystem (e.g. local, S3, GCS).
huggingface.HFDataset HFDataset loads Hugging Face datasets using the datasets library.
huggingface.HFTransformerPipelineDataset HFTransformerPipelineDataset loads pretrained Hugging Face transformers using the transformers library.
ibis.FileDataset FileDataset loads/saves data from/to a specified file format.
ibis.TableDataset TableDataset loads/saves data from/to Ibis table expressions.
json.JSONDataset JSONDataset loads/saves data from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It uses native json to handle the JSON file.
langchain.ChatAnthropicDataset ChatAnthropicDataset loads a ChatAnthropic langchain model.
langchain.ChatCohereDataset ChatCohereDataset loads a ChatCohere langchain model.
langchain.ChatOpenAIDataset OpenAI dataset used to access credentials at runtime.
langchain.OpenAIEmbeddingsDataset OpenAIEmbeddingsDataset loads a OpenAIEmbeddings langchain model.
matlab.MatlabDataset MatlabDataSet loads and saves data from/to a MATLAB file using scipy.io.
matplotlib.MatplotlibDataset MatplotlibDataset saves one or more Matplotlib objects as image files to an underlying filesystem (e.g. local, S3, GCS).
networkx.GMLDataset GMLDataset loads and saves graphs to a GML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GML data.
networkx.GraphMLDataset GraphMLDataset loads and saves graphs to a GraphML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GraphML data.
networkx.JSONDataset NetworkX JSONDataset loads and saves graphs to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create JSON data.
openxml.DocxDataset DocxDataset loads/saves data from/to a .docx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-docx to handle the .docx file.
openxml.PptxDataset PptxDataset loads/saves data from/to a .pptx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-pptx to handle the .pptx file.
pandas.CSVDataset A dataset that loads and saves data to/from CSV files using pandas.
pandas.DeltaTableDataset DeltaTableDataset loads/saves delta tables from/to a filesystem (e.g.: local, S3, GCS), Databricks unity catalog and AWS Glue catalog respectively. It handles load and save using a pandas dataframe.
pandas.ExcelDataset ExcelDataset loads/saves data from/to a Excel file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the Excel file.
pandas.FeatherDataset A dataset that loads and saves data to/from Feather files using pandas.
pandas.GBQQueryDataset A dataset that loads data from a provided SQL query in Google BigQuery using pandas-gbq. It is read-only.
pandas.GBQTableDataset A dataset that loads and saves data to/from Google BigQuery tables using pandas-gbq.
pandas.GenericDataset GenericDataset loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the type of read/write target.
pandas.HDFDataset A dataset that loads and saves data to/from HDF files using pandas.
pandas.JSONDataset A dataset that loads and saves data to/from JSON files using pandas.
pandas.ParquetDataset A dataset that loads and saves data to/from Parquet files using pandas.
pandas.SQLQueryDataset A dataset that loads data from a provided SQL query using pandas. It is read-only.
pandas.SQLTableDataset A dataset that loads data from a SQL table and saves a pandas DataFrame to a table.
pandas.XMLDataset A dataset that loads and saves data to/from XML files using pandas.
partitions.IncrementalDataset IncrementalDataset inherits from PartitionedDataset, which loads and saves partitioned file-like data using the underlying dataset definition.
partitions.PartitionedDataset PartitionedDataset loads and saves partitioned file-like data using the underlying dataset definition. It also uses fsspec for filesystem level operations.
pickle.PickleDataset PickleDataset loads/saves data from/to a Pickle file using an underlying filesystem (e.g.: local, S3, GCS). The underlying functionality is supported by the specified backend library passed in (defaults to the pickle library), so it supports all allowed options for loading and saving pickle files.
pillow.ImageDataset ImageDataset loads/saves image data as numpy from an underlying filesystem (e.g.: local, S3, GCS). It uses Pillow to handle image file.
plotly.HTMLDataset HTMLDataset saves a plotly figure to an HTML file using an underlying filesystem (e.g.: local, S3, GCS).
plotly.JSONDataset JSONDataset loads/saves a plotly figure from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS).
plotly.PlotlyDataset PlotlyDataset generates a plot from a pandas DataFrame and saves it to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It loads the JSON into a plotly figure.
polars.CSVDataset CSVDataset loads/saves data from/to a CSV file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the CSV file.
polars.EagerPolarsDataset EagerPolarsDataset loads/saves data from/to a data file using an filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target.
polars.LazyPolarsDataset LazyPolarsDataset loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target.
redis.PickleDataset PickleDataset loads/saves data from/to a Redis database. The underlying functionality is supported by the redis library, so it supports all allowed options for instantiating the redis app from_url and setting a value.
snowflake.SnowparkTableDataset SnowparkTableDataset loads and saves Snowpark DataFrames. As of October 2024, the Snowpark connector works with Python 3.9, 3.10, and 3.11. Python 3.12 is not supported yet.
spark.DeltaTableDataset DeltaTableDataset loads data into DeltaTable objects.
spark.GBQQueryDataset GBQQueryDataset loads data from Google BigQuery with a SQL query using BigQuery Spark connector.
spark.SparkDataset SparkDataset loads and saves Spark dataframes.
spark.SparkDatasetV2 SparkDatasetV2 loads and saves Spark dataframes with support for Spark Connect, Databricks Connect, and automatic Pandas-to-Spark conversion.
spark.SparkHiveDataset SparkHiveDataset loads and saves Spark dataframes stored on Hive.
spark.SparkJDBCDataset SparkJDBCDataset loads data from a database table accessible via JDBC URL url and connection properties and saves the content of a PySpark DataFrame to an external database table via JDBC.
spark.SparkStreamingDataset SparkStreamingDataset loads data to Spark Streaming Dataframe objects.
svmlight.SVMLightDataset SVMLightDataset loads/saves data from/to a svmlight/libsvm file using an underlying filesystem (e.g.: local, S3, GCS). It uses sklearn functions dump_svmlight_file to save and load_svmlight_file to load a file.
tensorflow.TensorFlowModelDataset TensorFlowModelDataset loads and saves TensorFlow models. The underlying functionality is supported by, and passes input arguments through to, TensorFlow 2.X load_model and save_model methods. TensorFlow does not currently support Python 3.14.
text.TextDataset TextDataset loads/saves data from/to a text file using an underlying filesystem (e.g.: local, S3, GCS).
yaml.YAMLDataset YAMLDataset loads/saves data from/to a YAML file using an underlying filesystem (e.g.: local, S3, GCS). It uses PyYAML to handle the YAML file.