kedro_datasets¶
kedro_datasets ¶
kedro_datasets is where you can find all of Kedro's data connectors.
KedroDeprecationWarning ¶
Bases: DeprecationWarning
Custom class for warnings about deprecated Kedro features.
Dataset Classes¶
| Name | Description |
|---|---|
| api.APIDataset | APIDataset loads/saves data from/to HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/latest/ |
| biosequence.BioSequenceDataset | BioSequenceDataset loads and saves data to a sequence file. |
| dask.CSVDataset | CSVDataset loads and saves data to comma-separated value file(s). It uses Dask remote data services to handle the corresponding load and save operations. |
| dask.ParquetDataset | ParquetDataset loads and saves data to parquet file(s). It uses Dask remote data services to handle the corresponding load and save operations. |
| databricks.ManagedTableDataset | ManagedTableDataset loads and saves data into managed delta tables in Databricks. |
| email.EmailMessageDataset | EmailMessageDataset loads/saves an email message from/to a file using an underlying filesystem (e.g.: local, S3, GCS). It uses the email package in the standard library to manage email messages. |
| geopandas.GenericDataset | GenericDataset loads/saves data to a file using an underlying filesystem (eg: local, S3, GCS). The underlying functionality is supported by geopandas, so it supports all allowed geopandas (pandas) options for loading and saving files. fiona (a dependency of geopandas) does not currently support Python 3.14. |
| holoviews.HoloviewsWriter | HoloviewsWriter saves Holoviews objects to image file(s) in an underlying filesystem (e.g. local, S3, GCS). |
| huggingface.HFDataset | HFDataset loads Hugging Face datasets using the datasets library. |
| huggingface.HFTransformerPipelineDataset | HFTransformerPipelineDataset loads pretrained Hugging Face transformers using the transformers library. |
| ibis.FileDataset | FileDataset loads/saves data from/to a specified file format. |
| ibis.TableDataset | TableDataset loads/saves data from/to Ibis table expressions. |
| json.JSONDataset | JSONDataset loads/saves data from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It uses native json to handle the JSON file. |
| langchain.ChatAnthropicDataset | ChatAnthropicDataset loads a ChatAnthropic langchain model. |
| langchain.ChatCohereDataset | ChatCohereDataset loads a ChatCohere langchain model. |
| langchain.ChatOpenAIDataset | OpenAI dataset used to access credentials at runtime. |
| langchain.OpenAIEmbeddingsDataset | OpenAIEmbeddingsDataset loads a OpenAIEmbeddings langchain model. |
| matlab.MatlabDataset | MatlabDataSet loads and saves data from/to a MATLAB file using scipy.io. |
| matplotlib.MatplotlibDataset | MatplotlibDataset saves one or more Matplotlib objects as image files to an underlying filesystem (e.g. local, S3, GCS). |
| networkx.GMLDataset | GMLDataset loads and saves graphs to a GML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GML data. |
| networkx.GraphMLDataset | GraphMLDataset loads and saves graphs to a GraphML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GraphML data. |
| networkx.JSONDataset | NetworkX JSONDataset loads and saves graphs to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create JSON data. |
| openxml.DocxDataset | DocxDataset loads/saves data from/to a .docx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-docx to handle the .docx file. |
| openxml.PptxDataset | PptxDataset loads/saves data from/to a .pptx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-pptx to handle the .pptx file. |
| pandas.CSVDataset | A dataset that loads and saves data to/from CSV files using pandas. |
| pandas.DeltaTableDataset | DeltaTableDataset loads/saves delta tables from/to a filesystem (e.g.: local, S3, GCS), Databricks unity catalog and AWS Glue catalog respectively. It handles load and save using a pandas dataframe. |
| pandas.ExcelDataset | ExcelDataset loads/saves data from/to a Excel file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the Excel file. |
| pandas.FeatherDataset | A dataset that loads and saves data to/from Feather files using pandas. |
| pandas.GBQQueryDataset | A dataset that loads data from a provided SQL query in Google BigQuery using pandas-gbq. It is read-only. |
| pandas.GBQTableDataset | A dataset that loads and saves data to/from Google BigQuery tables using pandas-gbq. |
| pandas.GenericDataset | GenericDataset loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the type of read/write target. |
| pandas.HDFDataset | A dataset that loads and saves data to/from HDF files using pandas. |
| pandas.JSONDataset | A dataset that loads and saves data to/from JSON files using pandas. |
| pandas.ParquetDataset | A dataset that loads and saves data to/from Parquet files using pandas. |
| pandas.SQLQueryDataset | A dataset that loads data from a provided SQL query using pandas. It is read-only. |
| pandas.SQLTableDataset | A dataset that loads data from a SQL table and saves a pandas DataFrame to a table. |
| pandas.XMLDataset | A dataset that loads and saves data to/from XML files using pandas. |
| partitions.IncrementalDataset | IncrementalDataset inherits from PartitionedDataset, which loads and saves partitioned file-like data using the underlying dataset definition. |
| partitions.PartitionedDataset | PartitionedDataset loads and saves partitioned file-like data using the underlying dataset definition. It also uses fsspec for filesystem level operations. |
| pickle.PickleDataset | PickleDataset loads/saves data from/to a Pickle file using an underlying filesystem (e.g.: local, S3, GCS). The underlying functionality is supported by the specified backend library passed in (defaults to the pickle library), so it supports all allowed options for loading and saving pickle files. |
| pillow.ImageDataset | ImageDataset loads/saves image data as numpy from an underlying filesystem (e.g.: local, S3, GCS). It uses Pillow to handle image file. |
| plotly.HTMLDataset | HTMLDataset saves a plotly figure to an HTML file using an underlying filesystem (e.g.: local, S3, GCS). |
| plotly.JSONDataset | JSONDataset loads/saves a plotly figure from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). |
| plotly.PlotlyDataset | PlotlyDataset generates a plot from a pandas DataFrame and saves it to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It loads the JSON into a plotly figure. |
| polars.CSVDataset | CSVDataset loads/saves data from/to a CSV file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the CSV file. |
| polars.EagerPolarsDataset | EagerPolarsDataset loads/saves data from/to a data file using an filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target. |
| polars.LazyPolarsDataset | LazyPolarsDataset loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target. |
| redis.PickleDataset | PickleDataset loads/saves data from/to a Redis database. The underlying functionality is supported by the redis library, so it supports all allowed options for instantiating the redis app from_url and setting a value. |
| snowflake.SnowparkTableDataset | SnowparkTableDataset loads and saves Snowpark DataFrames. As of October 2024, the Snowpark connector works with Python 3.9, 3.10, and 3.11. Python 3.12 is not supported yet. |
| spark.DeltaTableDataset | DeltaTableDataset loads data into DeltaTable objects. |
| spark.GBQQueryDataset | GBQQueryDataset loads data from Google BigQuery with a SQL query using BigQuery Spark connector. |
| spark.SparkDataset | SparkDataset loads and saves Spark dataframes. |
| spark.SparkDatasetV2 | SparkDatasetV2 loads and saves Spark dataframes with support for Spark Connect, Databricks Connect, and automatic Pandas-to-Spark conversion. |
| spark.SparkHiveDataset | SparkHiveDataset loads and saves Spark dataframes stored on Hive. |
| spark.SparkJDBCDataset | SparkJDBCDataset loads data from a database table accessible via JDBC URL url and connection properties and saves the content of a PySpark DataFrame to an external database table via JDBC. |
| spark.SparkStreamingDataset | SparkStreamingDataset loads data to Spark Streaming Dataframe objects. |
| svmlight.SVMLightDataset | SVMLightDataset loads/saves data from/to a svmlight/libsvm file using an underlying filesystem (e.g.: local, S3, GCS). It uses sklearn functions dump_svmlight_file to save and load_svmlight_file to load a file. |
| tensorflow.TensorFlowModelDataset | TensorFlowModelDataset loads and saves TensorFlow models. The underlying functionality is supported by, and passes input arguments through to, TensorFlow 2.X load_model and save_model methods. TensorFlow does not currently support Python 3.14. |
| text.TextDataset | TextDataset loads/saves data from/to a text file using an underlying filesystem (e.g.: local, S3, GCS). |
| yaml.YAMLDataset | YAMLDataset loads/saves data from/to a YAML file using an underlying filesystem (e.g.: local, S3, GCS). It uses PyYAML to handle the YAML file. |