kedro_datasets.snowflake.SnowparkTableDataset¶

class kedro_datasets.snowflake.SnowparkTableDataset(*, table_name, schema=None, database=None, load_args=None, save_args=None, credentials=None, metadata=None)[source]¶

SnowparkTableDataset loads and saves Snowpark dataframes.

As of Mar-2023, the snowpark connector only works with Python 3.8.

Example usage for the YAML API:

weather:
  type: kedro_datasets.snowflake.SnowparkTableDataset
  table_name: "weather_data"
  database: "meteorology"
  schema: "observations"
  credentials: db_credentials
  save_args:
    mode: overwrite
    column_order: name
    table_type: ''

You can skip everything but “table_name” if the database and schema are provided via credentials. That way catalog entries can be shorter if, for example, all used Snowflake tables live in same database/schema. Values in the dataset definition take priority over those defined in credentials.

Example: Credentials file provides all connection attributes, catalog entry “weather” reuses credentials parameters, “polygons” catalog entry reuses all credentials parameters except providing a different schema name. Second example of credentials file uses externalbrowser authentication.

catalog.yml

weather:
  type: kedro_datasets.snowflake.SnowparkTableDataset
  table_name: "weather_data"
  database: "meteorology"
  schema: "observations"
  credentials: snowflake_client
  save_args:
    mode: overwrite
    column_order: name
    table_type: ''

polygons:
  type: kedro_datasets.snowflake.SnowparkTableDataset
  table_name: "geopolygons"
  credentials: snowflake_client
  schema: "geodata"

credentials.yml

snowflake_client:
  account: 'ab12345.eu-central-1'
  port: 443
  warehouse: "datascience_wh"
  database: "detailed_data"
  schema: "observations"
  user: "service_account_abc"
  password: "supersecret"

credentials.yml (with externalbrowser authenticator)

snowflake_client:
  account: 'ab12345.eu-central-1'
  port: 443
  warehouse: "datascience_wh"
  database: "detailed_data"
  schema: "observations"
  user: "john_doe@wdomain.com"
  authenticator: "externalbrowser"

Attributes

`DEFAULT_LOAD_ARGS`
`DEFAULT_SAVE_ARGS`

Methods

`exists`()	Checks whether a data set's output already exists by calling the provided _exists() method.
`from_config`(name, config[, load_version, ...])	Create a data set instance using the configuration provided.
`load`()	Loads data by delegation to the provided load method.
`release`()	Release any cached data.
`save`(data)	Saves data by delegation to the provided save method.

DEFAULT_LOAD_ARGS: dict[str, Any] = {}¶

DEFAULT_SAVE_ARGS: dict[str, Any] = {}¶

__init__(*, table_name, schema=None, database=None, load_args=None, save_args=None, credentials=None, metadata=None)[source]¶

Creates a new instance of SnowparkTableDataset.

Parameters:

table_name (str) – The table name to load or save data to.
schema (Optional[str]) – Name of the schema where table_name is. Optional as can be provided as part of credentials dictionary. Argument value takes priority over one provided in credentials if any.
database (Optional[str]) – Name of the database where schema is. Optional as can be provided as part of credentials dictionary. Argument value takes priority over one provided in credentials if any.
load_args (Optional[dict[str, Any]]) – Currently not used
save_args (Optional[dict[str, Any]]) – Provided to underlying snowpark save_as_table To find all supported arguments, see here: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.DataFrameWriter.saveAsTable.html
credentials (Optional[dict[str, Any]]) – A dictionary with a snowpark connection string. To find all supported arguments, see here: https://docs.snowflake.com/en/user-guide/python-connector-api.html#connect
metadata (Optional[dict[str, Any]]) – Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.

exists()¶

Checks whether a data set’s output already exists by calling the provided _exists() method.

Return type:: bool
Returns:: Flag indicating whether the output already exists.
Raises:: DatasetError – when underlying exists method raises error.

classmethod from_config(name, config, load_version=None, save_version=None)¶

Create a data set instance using the configuration provided.

Parameters:

name (str) – Data set name.
config (dict[str, Any]) – Data set config dictionary.
load_version (str | None) – Version string to be used for load operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
save_version (str | None) – Version string to be used for save operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.

Return type:

AbstractDataset

Returns:

An instance of an AbstractDataset subclass.

Raises:

DatasetError – When the function fails to create the data set from its config.

load()¶

Loads data by delegation to the provided load method.

Return type:: TypeVar(_DO)
Returns:: Data returned by the provided load method.
Raises:: DatasetError – When underlying load method raises error.

release()¶

Release any cached data.

Raises:: DatasetError – when underlying release method raises error.
Return type:: None

save(data)¶

Saves data by delegation to the provided save method.

Parameters:

data (TypeVar(_DI)) – the value to be saved by provided save method.

Raises:

DatasetError – when underlying save method raises error.
FileNotFoundError – when save method got file instead of dir, on Windows.
NotADirectoryError – when save method got file instead of dir, on Unix.

Return type:

None