CSVDataset¶
CSVDataset loads and saves data to comma-separated value file(s). It uses Dask remote data services to handle the corresponding load and save operations.
kedro_datasets.dask.CSVDataset ¶
CSVDataset(
filepath,
load_args=None,
save_args=None,
credentials=None,
fs_args=None,
metadata=None,
)
Bases: AbstractDataset[DataFrame, DataFrame]
CSVDataset loads and saves data to comma-separated value file(s). It uses Dask
remote data services to handle the corresponding load and save operations:
https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html
Examples:
Using the YAML API:
cars:
type: dask.CSVDataset
filepath: s3://bucket_name/path/to/folder
save_args:
compression: GZIP
credentials:
client_kwargs:
aws_access_key_id: YOUR_KEY
aws_secret_access_key: YOUR_SECRET
Using the Python API:
>>> import dask.dataframe as dd
>>> import numpy as np
>>> import pandas as pd
>>> from kedro_datasets.dask import CSVDataset
>>>
>>> data = pd.DataFrame({"col1": [1, 2], "col2": [4, 5], "col3": [[5, 6], [7, 8]]})
>>> ddf = dd.from_pandas(data, npartitions=1)
>>>
>>> dataset = CSVDataset(filepath="path/to/folder/*.csv")
>>> dataset.save(ddf)
>>> reloaded = dataset.load()
>>> assert np.array_equal(ddf.compute(), reloaded.compute())
Parameters:
-
filepath(str) –Filepath in POSIX format to a CSV file CSV collection or the directory of a multipart CSV.
-
load_args(dict[str, Any] | None, default:None) –Additional loading options
dask.dataframe.read_csv: https://docs.dask.org/en/stable/generated/dask.dataframe.read_csv.html -
save_args(dict[str, Any] | None, default:None) –Additional saving options for
dask.dataframe.to_csv: https://docs.dask.org/en/stable/generated/dask.dataframe.to_csv.html -
credentials(dict[str, Any] | None, default:None) –Credentials required to get access to the underlying filesystem. E.g. for
GCSFileSystemit should look like{"token": None}. -
fs_args(dict[str, Any] | None, default:None) –Optional parameters to the backend file system driver: https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html#optional-parameters
-
metadata(dict[str, Any] | None, default:None) –Any arbitrary metadata. This is ignored by Kedro, but may be consumed by users or external plugins.
Source code in kedro_datasets/dask/csv_dataset.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
fs_args
property
¶
fs_args
Property of optional file system parameters.
Returns:
-
dict[str, Any]–A dictionary of backend file system parameters, including credentials.
_describe ¶
_describe()
Source code in kedro_datasets/dask/csv_dataset.py
101 102 103 104 105 106 | |
_exists ¶
_exists()
Source code in kedro_datasets/dask/csv_dataset.py
116 117 118 119 120 | |
load ¶
load()
Source code in kedro_datasets/dask/csv_dataset.py
108 109 110 111 | |
save ¶
save(data)
Source code in kedro_datasets/dask/csv_dataset.py
113 114 | |