kedro_datasets.pandas.SQLTableDataset¶

class kedro_datasets.pandas.SQLTableDataset(table_name, credentials, load_args=None, save_args=None, metadata=None)[source]¶

SQLTableDataset loads data from a SQL table and saves a pandas dataframe to a table. It uses pandas.DataFrame internally, so it supports all allowed pandas options on read_sql_table and to_sql methods. Since Pandas uses SQLAlchemy behind the scenes, when instantiating SQLTableDataset one needs to pass a compatible connection string either in credentials (see the example code snippet below) or in load_args and save_args. Connection string formats supported by SQLAlchemy can be found here: https://docs.sqlalchemy.org/core/engines.html#database-urls

SQLTableDataset modifies the save parameters and stores the data with no index. This is designed to make load and save methods symmetric.

Example usage for the YAML API:

shuttles_table_dataset:
  type: pandas.SQLTableDataset
  credentials: db_credentials
  table_name: shuttles
  load_args:
    schema: dwschema
  save_args:
    schema: dwschema
    if_exists: replace

Sample database credentials entry in credentials.yml:

db_credentials:
  con: postgresql://scott:tiger@localhost/test

Example usage for the Python API:

 from kedro_datasets.pandas import SQLTableDataset
 import pandas as pd

 data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5],
...                      "col3": [5, 6]})
 table_name = "table_a"
 credentials = {
...     "con": "postgresql://scott:tiger@localhost/test"
... }
 data_set = SQLTableDataset(table_name=table_name,
...                            credentials=credentials)

 data_set.save(data)
 reloaded = data_set.load()

 assert data.equals(reloaded)

Attributes

`DEFAULT_LOAD_ARGS`
`DEFAULT_SAVE_ARGS`
`engines`

Methods

`create_connection`(connection_str)	Given a connection string, create singleton connection to be used across all instances of `SQLTableDataset` that need to connect to the same source.
`exists`()	Checks whether a data set's output already exists by calling the provided _exists() method.
`from_config`(name, config[, load_version, ...])	Create a data set instance using the configuration provided.
`load`()	Loads data by delegation to the provided load method.
`release`()	Release any cached data.
`save`(data)	Saves data by delegation to the provided save method.

DEFAULT_LOAD_ARGS: Dict[str, Any] = {}¶

DEFAULT_SAVE_ARGS: Dict[str, Any] = {'index': False}¶

__init__(table_name, credentials, load_args=None, save_args=None, metadata=None)[source]¶

Creates a new SQLTableDataset.

Parameters:

table_name (str) – The table name to load or save data to. It overwrites name in save_args and table_name parameters in load_args.
credentials (Dict[str, Any]) – A dictionary with a SQLAlchemy connection string. Users are supposed to provide the connection string ‘con’ through credentials. It overwrites con parameter in load_args and save_args in case it is provided. To find all supported connection string formats, see here: https://docs.sqlalchemy.org/core/engines.html#database-urls
load_args (Optional[Dict[str, Any]]) – Provided to underlying pandas read_sql_table function along with the connection string. To find all supported arguments, see here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html To find all supported connection string formats, see here: https://docs.sqlalchemy.org/core/engines.html#database-urls
save_args (Optional[Dict[str, Any]]) – Provided to underlying pandas to_sql function along with the connection string. To find all supported arguments, see here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html To find all supported connection string formats, see here: https://docs.sqlalchemy.org/core/engines.html#database-urls It has index=False in the default parameters.
metadata (Optional[Dict[str, Any]]) – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.

Raises:

DatasetError – When either table_name or con is empty.

classmethod create_connection(connection_str)[source]¶

Given a connection string, create singleton connection to be used across all instances of SQLTableDataset that need to connect to the same source.

Return type:: None

engines: Dict[str, Any] = {}¶

exists()¶

Checks whether a data set’s output already exists by calling the provided _exists() method.

Return type:: bool
Returns:: Flag indicating whether the output already exists.
Raises:: DatasetError – when underlying exists method raises error.

classmethod from_config(name, config, load_version=None, save_version=None)¶

Create a data set instance using the configuration provided.

Parameters:

name – Data set name.
config – Data set config dictionary.
load_version – Version string to be used for load operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
save_version – Version string to be used for save operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.

Returns:

An instance of an AbstractDataset subclass.

Raises:

DatasetError – When the function fails to create the data set from its config.

load()¶

Loads data by delegation to the provided load method.

Return type:: TypeVar(_DO)
Returns:: Data returned by the provided load method.
Raises:: DatasetError – When underlying load method raises error.

release()¶

Release any cached data.

Raises:: DatasetError – when underlying release method raises error.
Return type:: None

save(data)¶

Saves data by delegation to the provided save method.

Parameters:

data (TypeVar(_DI)) – the value to be saved by provided save method.

Raises:

DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.

Return type:

None