SQLTableDataset¶
SQLTableDataset loads data from a SQL table and saves a pandas DataFrame to a table.
kedro_datasets.pandas.SQLTableDataset ¶
SQLTableDataset(
*,
table_name,
credentials,
load_args=None,
save_args=None,
metadata=None
)
Bases: AbstractDataset[DataFrame, DataFrame]
SQLTableDataset loads data from a SQL table and saves a pandas
dataframe to a table. It uses pandas.DataFrame internally,
so it supports all allowed pandas options on read_sql_table and
to_sql methods. Since Pandas uses SQLAlchemy behind the scenes, when
instantiating SQLTableDataset one needs to pass a compatible connection
string either in credentials (see the example code snippet below) or in
load_args and save_args. Connection string formats supported by
SQLAlchemy can be found here:
https://docs.sqlalchemy.org/core/engines.html#database-urls
SQLTableDataset modifies the save parameters and stores
the data with no index. This is designed to make load and save methods
symmetric.
Examples:
Using the YAML API:
shuttles_table_dataset:
type: pandas.SQLTableDataset
credentials: db_credentials
table_name: shuttles
load_args:
schema: dwschema
save_args:
schema: dwschema
if_exists: replace
Sample database credentials entry in credentials.yml:
db_credentials:
con: postgresql://scott:tiger@localhost/test
pool_size: 10 # additional parameters
Using the Python API:
>>> import pandas as pd
>>> from kedro_datasets.pandas import SQLTableDataset
>>>
>>> data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [5, 6]})
>>>
>>> table_name = "table_a"
>>> credentials = {"con": f"sqlite:///{tmp_path / 'test.db'}"}
>>> dataset = SQLTableDataset(table_name=table_name, credentials=credentials)
>>> dataset.save(data)
>>> reloaded = dataset.load()
>>> assert data.equals(reloaded)
Parameters:
-
table_name(str) –The table name to load or save data to. It overwrites name in
save_argsandtable_nameparameters inload_args. -
credentials(dict[str, Any]) –A dictionary with a
SQLAlchemyconnection string. Users are supposed to provide the connection string 'con' through credentials. To find all supported connection string formats, see here: https://docs.sqlalchemy.org/core/engines.html#database-urls Additional parameters for the sqlalchemy engine can be provided alongside the 'con' parameter. -
load_args(dict[str, Any] | None, default:None) –Provided to underlying pandas
read_sql_tablefunction along with the connection string. To find all supported arguments, see here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_table.html To find all supported connection string formats, see here: https://docs.sqlalchemy.org/core/engines.html#database-urls -
save_args(dict[str, Any] | None, default:None) –Provided to underlying pandas
to_sqlfunction along with the connection string. To find all supported arguments, see here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html To find all supported connection string formats, see here: https://docs.sqlalchemy.org/core/engines.html#database-urls It hasindex=Falsein the default parameters. -
metadata(dict[str, Any] | None, default:None) –Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
Raises:
-
DatasetError–When either
table_nameorconis empty.
Source code in kedro_datasets/pandas/sql_dataset.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | |
_connection_args
instance-attribute
¶
_connection_args = {
k: (credentials[k]) for k in (keys()) if k != "con"
}
_describe ¶
_describe()
Source code in kedro_datasets/pandas/sql_dataset.py
248 249 250 251 252 253 254 255 256 257 | |
_exists ¶
_exists()
Source code in kedro_datasets/pandas/sql_dataset.py
265 266 267 268 | |
create_connection
classmethod
¶
create_connection(connection_str, connection_args=None)
Given a connection string, create singleton connection
to be used across all instances of SQLTableDataset that
need to connect to the same source.
Source code in kedro_datasets/pandas/sql_dataset.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
load ¶
load()
Source code in kedro_datasets/pandas/sql_dataset.py
259 260 | |
preview ¶
preview(nrows=5)
Generate a preview of the dataset with a specified number of rows.
Parameters:
-
nrows(int, default:5) –The number of rows to include in the preview. Defaults to 5.
Returns:
-
dict–A dictionary containing the data in a split format.
Source code in kedro_datasets/pandas/sql_dataset.py
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 | |
save ¶
save(data)
Source code in kedro_datasets/pandas/sql_dataset.py
262 263 | |