kedro_datasets¶
kedro_datasets ¶
kedro_datasets is where you can find all of Kedro's data connectors.
KedroDeprecationWarning ¶
Bases: DeprecationWarning
Custom class for warnings about deprecated Kedro features.
Dataset Classes¶
| Name | Description |
|---|---|
| api.APIDataset | APIDataset loads/saves data from/to HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/latest/ |
| biosequence.BioSequenceDataset | BioSequenceDataset loads and saves data to a sequence file. |
| dask.CSVDataset | CSVDataset loads and saves data to comma-separated value file(s). It uses Dask remote data services to handle the corresponding load and save operations. |
| dask.ParquetDataset | ParquetDataset loads and saves data to parquet file(s). It uses Dask remote data services to handle the corresponding load and save operations. |
| databricks.ManagedTableDataset | ManagedTableDataset loads and saves data into managed delta tables in Databricks. |
| email.EmailMessageDataset | EmailMessageDataset loads/saves an email message from/to a file using an underlying filesystem (e.g.: local, S3, GCS). It uses the email package in the standard library to manage email messages. |
| geopandas.GenericDataset | GenericDataset loads/saves data to a file using an underlying filesystem (eg: local, S3, GCS). The underlying functionality is supported by geopandas, so it supports all allowed geopandas (pandas) options for loading and saving files. |
| holoviews.HoloviewsWriter | HoloviewsWriter saves Holoviews objects to image file(s) in an underlying filesystem (e.g. local, S3, GCS). |
| huggingface.HFDataset | HFDataset loads Hugging Face datasets using the datasets library. |
| huggingface.HFTransformerPipelineDataset | HFTransformerPipelineDataset loads pretrained Hugging Face transformers using the transformers library. |
| ibis.FileDataset | FileDataset loads/saves data from/to a specified file format. |
| ibis.TableDataset | TableDataset loads/saves data from/to Ibis table expressions. |
| json.JSONDataset | JSONDataset loads/saves data from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It uses native json to handle the JSON file. |
| langchain.ChatAnthropicDataset | ChatAnthropicDataset loads a ChatAnthropic langchain model. |
| langchain.ChatCohereDataset | ChatCohereDataset loads a ChatCohere langchain model. |
| langchain.ChatOpenAIDataset | OpenAI dataset used to access credentials at runtime. |
| langchain.OpenAIEmbeddingsDataset | OpenAIEmbeddingsDataset loads a OpenAIEmbeddings langchain model. |
| matlab.MatlabDataset | MatlabDataSet loads and saves data from/to a MATLAB file using scipy.io. |
| matplotlib.MatplotlibDataset | MatplotlibDataset saves one or more Matplotlib objects as image files to an underlying filesystem (e.g. local, S3, GCS). |
| networkx.GMLDataset | GMLDataset loads and saves graphs to a GML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GML data. |
| networkx.GraphMLDataset | GraphMLDataset loads and saves graphs to a GraphML file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create GraphML data. |
| networkx.JSONDataset | NetworkX JSONDataset loads and saves graphs to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). NetworkX is used to create JSON data. |
| openxml.DocxDataset | DocxDataset loads/saves data from/to a .docx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-docx to handle the .docx file. |
| openxml.PptxDataset | PptxDataset loads/saves data from/to a .pptx file using an underlying filesystem (e.g.: local, S3, GCS). It uses python-pptx to handle the .pptx file. |
| pandas.CSVDataset | A dataset that loads and saves data to/from CSV files using pandas. |
| pandas.DeltaTableDataset | DeltaTableDataset loads/saves delta tables from/to a filesystem (e.g.: local, S3, GCS), Databricks unity catalog and AWS Glue catalog respectively. It handles load and save using a pandas dataframe. |
| pandas.ExcelDataset | ExcelDataset loads/saves data from/to a Excel file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the Excel file. |
| pandas.FeatherDataset | A dataset that loads and saves data to/from Feather files using pandas. |
| pandas.GBQQueryDataset | A dataset that loads data from a provided SQL query in Google BigQuery using pandas-gbq. It is read-only. |
| pandas.GBQTableDataset | A dataset that loads and saves data to/from Google BigQuery tables using pandas-gbq. |
| pandas.GenericDataset | GenericDataset loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses pandas to handle the type of read/write target. |
| pandas.HDFDataset | A dataset that loads and saves data to/from HDF files using pandas. |
| pandas.JSONDataset | A dataset that loads and saves data to/from JSON files using pandas. |
| pandas.ParquetDataset | A dataset that loads and saves data to/from Parquet files using pandas. |
| pandas.SQLQueryDataset | A dataset that loads data from a provided SQL query using pandas. It is read-only. |
| pandas.SQLTableDataset | A dataset that loads data from a SQL table and saves a pandas DataFrame to a table. |
| pandas.XMLDataset | A dataset that loads and saves data to/from XML files using pandas. |
| partitions.IncrementalDataset | IncrementalDataset inherits from PartitionedDataset, which loads and saves partitioned file-like data using the underlying dataset definition. |
| partitions.PartitionedDataset | PartitionedDataset loads and saves partitioned file-like data using the underlying dataset definition. It also uses fsspec for filesystem level operations. |
| pickle.PickleDataset | PickleDataset loads/saves data from/to a Pickle file using an underlying filesystem (e.g.: local, S3, GCS). The underlying functionality is supported by the specified backend library passed in (defaults to the pickle library), so it supports all allowed options for loading and saving pickle files. |
| pillow.ImageDataset | ImageDataset loads/saves image data as numpy from an underlying filesystem (e.g.: local, S3, GCS). It uses Pillow to handle image file. |
| plotly.HTMLDataset | HTMLDataset saves a plotly figure to an HTML file using an underlying filesystem (e.g.: local, S3, GCS). |
| plotly.JSONDataset | JSONDataset loads/saves a plotly figure from/to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). |
| plotly.PlotlyDataset | PlotlyDataset generates a plot from a pandas DataFrame and saves it to a JSON file using an underlying filesystem (e.g.: local, S3, GCS). It loads the JSON into a plotly figure. |
| polars.CSVDataset | CSVDataset loads/saves data from/to a CSV file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the CSV file. |
| polars.EagerPolarsDataset | EagerPolarsDataset loads/saves data from/to a data file using an filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target. |
| polars.LazyPolarsDataset | LazyPolarsDataset loads/saves data from/to a data file using an underlying filesystem (e.g.: local, S3, GCS). It uses polars to handle the type of read/write target. |
| redis.PickleDataset | PickleDataset loads/saves data from/to a Redis database. The underlying functionality is supported by the redis library, so it supports all allowed options for instantiating the redis app from_url and setting a value. |
| snowflake.SnowparkTableDataset | SnowparkTableDataset loads and saves Snowpark DataFrames. As of October 2024, the Snowpark connector works with Python 3.9, 3.10, and 3.11. Python 3.12 is not supported yet. |
| spark.DeltaTableDataset | DeltaTableDataset loads data into DeltaTable objects. |
| spark.GBQQueryDataset | GBQQueryDataset loads data from Google BigQuery with a SQL query using BigQuery Spark connector. |
| spark.SparkDataset | SparkDataset loads and saves Spark dataframes. |
| spark.SparkDatasetV2 | SparkDatasetV2 loads and saves Spark dataframes with support for Spark Connect, Databricks Connect, and automatic Pandas-to-Spark conversion. |
| spark.SparkHiveDataset | SparkHiveDataset loads and saves Spark dataframes stored on Hive. |
| spark.SparkJDBCDataset | SparkJDBCDataset loads data from a database table accessible via JDBC URL url and connection properties and saves the content of a PySpark DataFrame to an external database table via JDBC. |
| spark.SparkStreamingDataset | SparkStreamingDataset loads data to Spark Streaming Dataframe objects. |
| svmlight.SVMLightDataset | SVMLightDataset loads/saves data from/to a svmlight/libsvm file using an underlying filesystem (e.g.: local, S3, GCS). It uses sklearn functions dump_svmlight_file to save and load_svmlight_file to load a file. |
| tensorflow.TensorFlowModelDataset | TensorFlowModelDataset loads and saves TensorFlow models. The underlying functionality is supported by, and passes input arguments through to, TensorFlow 2.X load_model and save_model methods. |
| text.TextDataset | TextDataset loads/saves data from/to a text file using an underlying filesystem (e.g.: local, S3, GCS). |
| yaml.YAMLDataset | YAMLDataset loads/saves data from/to a YAML file using an underlying filesystem (e.g.: local, S3, GCS). It uses PyYAML to handle the YAML file. |