Description provides functionality to read and write to a number of data sets. At the core of the library is the AbstractDataset class.


AbstractDataset is the base class for all data set implementations. All data set implementations should extend this abstract class and implement the methods marked as abstract. If a specific dataset implementation cannot be used in conjunction with the ParallelRunner, such user-defined dataset should have the attribute _SINGLE_PROCESS = True. Example: ::., ...)

AbstractVersionedDataset is the base class for all versioned data set implementations.[, version, ...])

CachedDataset is a dataset wrapper which caches in memory the data saved, so that the user avoids io operations with slow storage media.[datasets, feed_dict, ...])

DataCatalog stores instances of AbstractDataset implementations to provide load and save capabilities from anywhere in the program., save[, exists, ...])

LambdaDataset loads and saves data to a data set.[data, copy_mode, ...])

MemoryDataset loads and saves data from/to an in-memory Python object., save)

This namedtuple is used to provide load and save versions for versioned data sets.


DatasetAlreadyExistsError raised by DataCatalog class in case of trying to add a data set which already exists in the DataCatalog.

DatasetError raised by AbstractDataset implementations in case of failure of input/output methods.

DatasetNotFoundError raised by DataCatalog class in case of trying to use a non-existing data set.