kedro_datasets¶

kedro_datasets ¶

kedro_datasets is where you can find all of Kedro's data connectors.

KedroDeprecationWarning ¶

Bases: DeprecationWarning

Custom class for warnings about deprecated Kedro features.

Dataset Classes¶

Name	Description
api.APIDataset	`APIDataset` loads/saves data from/to HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/latest/
biosequence.BioSequenceDataset	`BioSequenceDataset` loads and saves data to a sequence file.
dask.CSVDataset	`CSVDataset` loads and saves data to comma-separated value file(s). It uses Dask remote data services to handle the corresponding load and save operations.
dask.ParquetDataset	`ParquetDataset` loads and saves data to parquet file(s). It uses Dask remote data services to handle the corresponding load and save operations.
databricks.ManagedTableDataset	`ManagedTableDataset` loads and saves data into managed delta tables in Databricks.
email.EmailMessageDataset	`EmailMessageDataset` loads/saves an email message from/to a file using an underlying filesystem (e.g.: local, S3, GCS). It uses the `email` package in the standard library to manage email messages.
geopandas.GenericDataset	`GenericDataset` loads/saves data to a file using an underlying filesystem (eg: local, S3, GCS). The underlying functionality is supported by geopandas, so it supports all allowed geopandas (pandas) options for loading and saving files.
holoviews.HoloviewsWriter	`HoloviewsWriter` saves Holoviews objects to image file(s) in an underlying filesystem (e.g. local, S3, GCS).
huggingface.HFDataset	`HFDataset` loads Hugging Face datasets using the `datasets` library.
huggingface.HFTransformerPipelineDataset	`HFTransformerPipelineDataset` loads pretrained Hugging Face transformers using the `transformers` library.
ibis.FileDataset	`FileDataset` loads/saves data from/to a specified file format.
ibis.TableDataset	`TableDataset` loads/saves data from/to Ibis table expressions.
json.JSONDataset	`JSONDataset` loads/saves data from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It uses native json to handle the JSON file.
langchain.ChatAnthropicDataset	`ChatAnthropicDataset` loads a ChatAnthropic `langchain` model.
langchain.ChatCohereDataset	`ChatCohereDataset` loads a ChatCohere `langchain` model.
langchain.ChatOpenAIDataset	OpenAI dataset used to access credentials at runtime.
langchain.OpenAIEmbeddingsDataset	`OpenAIEmbeddingsDataset` loads a OpenAIEmbeddings `langchain` model.
matlab.MatlabDataset	`MatlabDataSet` loads and saves data from/to a MATLAB file using scipy.io.
matplotlib.MatplotlibDataset	`MatplotlibDataset` saves one or more Matplotlib objects as image files to an underlying filesystem (e.g. local, S3, GCS).
networkx.GMLDataset	`GMLDataset` loads and saves graphs to a GML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GML data.
networkx.GraphMLDataset	`GraphMLDataset` loads and saves graphs to a GraphML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GraphML data.
networkx.JSONDataset	NetworkX `JSONDataset` loads and saves graphs to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create JSON data.
openxml.DocxDataset	`DocxDataset` loads/saves data from/to a .docx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-docx to handle the .docx file.
openxml.PptxDataset	`PptxDataset` loads/saves data from/to a .pptx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-pptx to handle the .pptx file.
pandas.CSVDataset	A dataset that loads and saves data to/from CSV files using pandas.
pandas.DeltaTableDataset	`DeltaTableDataset` loads/saves delta tables from/to a filesystem (e.g.: local, S3, GCS), Databricks unity catalog and AWS Glue catalog respectively. It handles load and save using a pandas dataframe.
pandas.ExcelDataset	`ExcelDataset` loads/saves data from/to a Excel file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the Excel file.
pandas.FeatherDataset	A dataset that loads and saves data to/from Feather files using pandas.
pandas.GBQQueryDataset	A dataset that loads data from a provided SQL query in Google BigQuery using pandas-gbq. It is read-only.
pandas.GBQTableDataset	A dataset that loads and saves data to/from Google BigQuery tables using pandas-gbq.
pandas.GenericDataset	`GenericDataset` loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the type of read/write target.
pandas.HDFDataset	A dataset that loads and saves data to/from HDF files using pandas.
pandas.JSONDataset	A dataset that loads and saves data to/from JSON files using pandas.
pandas.ParquetDataset	A dataset that loads and saves data to/from Parquet files using pandas.
pandas.SQLQueryDataset	A dataset that loads data from a provided SQL query using pandas. It is read-only.
pandas.SQLTableDataset	A dataset that loads data from a SQL table and saves a pandas DataFrame to a table.
pandas.XMLDataset	A dataset that loads and saves data to/from XML files using pandas.
partitions.IncrementalDataset	`IncrementalDataset` inherits from `PartitionedDataset`, which loads and saves partitioned file-like data using the underlying dataset definition.
partitions.PartitionedDataset	`PartitionedDataset` loads and saves partitioned file-like data using the underlying dataset definition. It also uses `fsspec` for filesystem level operations.
pickle.PickleDataset	`PickleDataset` loads/saves data from/to a Pickle file using an underlying filesystem (e.g.: local, S3, GCS). The underlying functionality is supported by the specified backend library passed in (defaults to the `pickle` library), so it supports all allowed options for loading and saving pickle files.
pillow.ImageDataset	`ImageDataset` loads/saves image data as `numpy` from an underlying filesystem (e.g.: local, S3, GCS). It uses Pillow to handle image file.
plotly.HTMLDataset	`HTMLDataset` saves a plotly figure to an HTML file using an underlying filesystem (e.g.: local, S3, GCS).
plotly.JSONDataset	`JSONDataset` loads/saves a plotly figure from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS).
plotly.PlotlyDataset	`PlotlyDataset` generates a plot from a pandas DataFrame and saves it to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It loads the JSON into a plotly figure.
polars.CSVDataset	`CSVDataset` loads/saves data from/to a CSV file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the CSV file.
polars.EagerPolarsDataset	`EagerPolarsDataset` loads/saves data from/to a data file using an filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target.
polars.LazyPolarsDataset	`LazyPolarsDataset` loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target.
redis.PickleDataset	`PickleDataset` loads/saves data from/to a Redis database. The underlying functionality is supported by the redis library, so it supports all allowed options for instantiating the redis app `from_url` and setting a value.
snowflake.SnowparkTableDataset	`SnowparkTableDataset` loads and saves Snowpark DataFrames. As of October 2024, the Snowpark connector works with Python 3.9, 3.10, and 3.11. Python 3.12 is not supported yet.
spark.DeltaTableDataset	`DeltaTableDataset` loads data into DeltaTable objects.
spark.GBQQueryDataset	`GBQQueryDataset` loads data from Google BigQuery with a SQL query using BigQuery Spark connector.
spark.SparkDataset	`SparkDataset` loads and saves Spark dataframes.
spark.SparkDatasetV2	`SparkDatasetV2` loads and saves Spark dataframes with support for Spark Connect, Databricks Connect, and automatic Pandas-to-Spark conversion.
spark.SparkHiveDataset	`SparkHiveDataset` loads and saves Spark dataframes stored on Hive.
spark.SparkJDBCDataset	`SparkJDBCDataset` loads data from a database table accessible via JDBC URL url and connection properties and saves the content of a PySpark DataFrame to an external database table via JDBC.
spark.SparkStreamingDataset	`SparkStreamingDataset` loads data to Spark Streaming Dataframe objects.
svmlight.SVMLightDataset	`SVMLightDataset` loads/saves data from/to a svmlight/libsvm file using an underlying filesystem (e.g.: local, S3, GCS). It uses sklearn functions `dump_svmlight_file` to save and `load_svmlight_file` to load a file.
tensorflow.TensorFlowModelDataset	`TensorFlowModelDataset` loads and saves TensorFlow models. The underlying functionality is supported by, and passes input arguments through to, TensorFlow 2.X load_model and save_model methods.
text.TextDataset	`TextDataset` loads/saves data from/to a text file using an underlying filesystem (e.g.: local, S3, GCS).
yaml.YAMLDataset	`YAMLDataset` loads/saves data from/to a YAML file using an underlying filesystem (e.g.: local, S3, GCS). It uses PyYAML to handle the YAML file.