kedro_datasets.snowflake.SnowparkTableDataset¶
- class kedro_datasets.snowflake.SnowparkTableDataset(*, table_name, schema=None, database=None, load_args=None, save_args=None, credentials=None, metadata=None)[source]¶
SnowparkTableDatasetloads and saves Snowpark dataframes.As of Mar-2023, the snowpark connector only works with Python 3.8.
Example usage for the YAML API:
weather: type: kedro_datasets.snowflake.SnowparkTableDataset table_name: "weather_data" database: "meteorology" schema: "observations" credentials: db_credentials save_args: mode: overwrite column_order: name table_type: ''
You can skip everything but “table_name” if the database and schema are provided via credentials. That way catalog entries can be shorter if, for example, all used Snowflake tables live in same database/schema. Values in the dataset definition take priority over those defined in credentials.
Example: Credentials file provides all connection attributes, catalog entry “weather” reuses credentials parameters, “polygons” catalog entry reuses all credentials parameters except providing a different schema name. Second example of credentials file uses
externalbrowserauthentication.catalog.yml
weather: type: kedro_datasets.snowflake.SnowparkTableDataset table_name: "weather_data" database: "meteorology" schema: "observations" credentials: snowflake_client save_args: mode: overwrite column_order: name table_type: '' polygons: type: kedro_datasets.snowflake.SnowparkTableDataset table_name: "geopolygons" credentials: snowflake_client schema: "geodata"
credentials.yml
snowflake_client: account: 'ab12345.eu-central-1' port: 443 warehouse: "datascience_wh" database: "detailed_data" schema: "observations" user: "service_account_abc" password: "supersecret"
credentials.yml (with externalbrowser authenticator)
snowflake_client: account: 'ab12345.eu-central-1' port: 443 warehouse: "datascience_wh" database: "detailed_data" schema: "observations" user: "john_doe@wdomain.com" authenticator: "externalbrowser"
Attributes
Methods
exists()Checks whether a data set's output already exists by calling the provided _exists() method.
from_config(name, config[, load_version, ...])Create a data set instance using the configuration provided.
load()Loads data by delegation to the provided load method.
release()Release any cached data.
save(data)Saves data by delegation to the provided save method.
- DEFAULT_LOAD_ARGS: dict[str, Any] = {}¶
- DEFAULT_SAVE_ARGS: dict[str, Any] = {}¶
- __init__(*, table_name, schema=None, database=None, load_args=None, save_args=None, credentials=None, metadata=None)[source]¶
Creates a new instance of
SnowparkTableDataset.- Parameters:
table_name (
str) – The table name to load or save data to.schema (
Optional[str]) – Name of the schema wheretable_nameis. Optional as can be provided as part ofcredentialsdictionary. Argument value takes priority over one provided incredentialsif any.database (
Optional[str]) – Name of the database whereschemais. Optional as can be provided as part ofcredentialsdictionary. Argument value takes priority over one provided incredentialsif any.load_args (
Optional[dict[str,Any]]) – Currently not usedsave_args (
Optional[dict[str,Any]]) – Provided to underlying snowparksave_as_tableTo find all supported arguments, see here: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.DataFrameWriter.saveAsTable.htmlcredentials (
Optional[dict[str,Any]]) – A dictionary with a snowpark connection string. To find all supported arguments, see here: https://docs.snowflake.com/en/user-guide/python-connector-api.html#connectmetadata (
Optional[dict[str,Any]]) – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
- exists()¶
Checks whether a data set’s output already exists by calling the provided _exists() method.
- Return type:
bool- Returns:
Flag indicating whether the output already exists.
- Raises:
DatasetError – when underlying exists method raises error.
- classmethod from_config(name, config, load_version=None, save_version=None)¶
Create a data set instance using the configuration provided.
- Parameters:
name (str) – Data set name.
config (dict[str, Any]) – Data set config dictionary.
load_version (str | None) – Version string to be used for
loadoperation if the data set is versioned. Has no effect on the data set if versioning was not enabled.save_version (str | None) – Version string to be used for
saveoperation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
- Return type:
AbstractDataset
- Returns:
An instance of an
AbstractDatasetsubclass.- Raises:
DatasetError – When the function fails to create the data set from its config.
- load()¶
Loads data by delegation to the provided load method.
- Return type:
TypeVar(_DO)- Returns:
Data returned by the provided load method.
- Raises:
DatasetError – When underlying load method raises error.
- release()¶
Release any cached data.
- Raises:
DatasetError – when underlying release method raises error.
- Return type:
None
- save(data)¶
Saves data by delegation to the provided save method.
- Parameters:
data (
TypeVar(_DI)) – the value to be saved by provided save method.- Raises:
DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.
- Return type:
None