squirrel.store

Package Contents

Classes

AbstractStore

Abstract class that specifies the map-style storage api of squirrel.

FilesystemStore

Store that uses fsspec to read from / write to files.

SquirrelStore

FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).

class squirrel.store.AbstractStore

Bases: abc.ABC

Abstract class that specifies the map-style storage api of squirrel.

A store is responsible for persisting and retrieving back items. Each item is persisted with a corresponding key using the set() method. Then, the item can be retrieved using the get() method. Keys to all persisted items can be retrieved using the keys() method.

The meaning or content of a single item depends on the use case and the implementation of the store. For most machine-learning-related tasks, an item would be a single sample, or a batch or shard of samples.

abstract get(key: Any, **kwargs)Iterable

Returns an iterable over the item(s) corresponding to the given key.

Note that it is possible to implement this method as a generator, or to simply return a list of items. There is no restriction on the type or number of items. For example, a key might be corresponding to a single item that holds a single sample or to a single item that contains one shard of multiple samples.

abstract keys(**kwargs)Iterable

Returns an iterable over all keys in the store.

Note that it is possible to implement this method as a generator, or to simply return a list of keys. The iterable should contain the keys for all items retrievable using get().

abstract set(key: Any, value: Any, **kwargs)None

Persists an item with the given key.

There is no restriction on the type of the item. However, implementations of AbstractStore can put their own restrictions.

class squirrel.store.FilesystemStore(url: str, serializer: Optional[squirrel.serialization.SquirrelSerializer] = None, clean: bool = False, **storage_options)

Bases: squirrel.store.store.AbstractStore

Store that uses fsspec to read from / write to files.

Initializes FilesystemStore.

Parameters
  • url (str) – Path to the root directory. If this path does not exist, it will be created.

  • serializer (SquirrelSerializer, optional) – Serializer that is used to serialize data before persisting (see set()) and to deserialize data after reading (see get()). If not specified, data will not be (de)serialized. Defaults to None.

  • clean (bool) – If true, all files in the store will be removed recursively

  • **storage_options – Keyword arguments passed to filesystem initializer.

get(key: str, mode: str = 'rb', **open_kwargs)Any

Yields the item with the given key.

If the store has a serializer, data read from the file will be deserialized.

Parameters
  • key (str) – Key corresponding to the item to retrieve.

  • mode (str) – IO mode to use when opening the file. Defaults to “rb”.

  • **open_kwargs – Keyword arguments that will be forwarded to the filesystem object when opening the file.

Yields

(Any) Item with the given key.

keys(nested: bool = True, **kwargs)Iterator[str]

Yields all paths in the store, relative to the root directory.

Paths are generated using squirrel.iterstream.source.FilePathGenerator.

Parameters
  • nested (bool) – Whether to return paths that are not direct children of the root directory. If True, all paths in the store will be yielded. Otherwise, only the top-level paths (i.e. direct children of the root path) will be yielded. This option is passed to FilePathGenerator initializer. Defaults to True.

  • **kwargs – Other keyword arguments passed to the FilePathGenerator initializer. If a key is present in both kwargs and self.storage_options, the value from kwargs will be used.

Yields

(str) Paths to files and directories in the store relative to the root directory.

set(value: Any, key: Optional[str] = None, mode: str = 'wb', **open_kwargs)None

Persists an item with the given key.

If the store has a serializer, data item will be serialized before writing to a file.

Parameters
  • value (Any) – Item to be persisted.

  • key (Optional[str]) – Optional key corresponding to the item to persist.

  • mode (str) – IO mode to use when opening the file. Defaults to “wb”.

  • **open_kwargs – Keyword arguments that will be forwarded to the filesystem object when opening the file.

class squirrel.store.SquirrelStore(url: str, serializer: squirrel.serialization.SquirrelSerializer, clean: bool = False, **storage_options)

Bases: squirrel.store.filesystem.FilesystemStore

FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).

Initializes SquirrelStore.

Parameters
  • url (str) – Path to the root directory. If this path does not exist, it will be created.

  • serializer (SquirrelSerializer) – Serializer that is used to serialize data before persisting (see set()) and to deserialize data after reading (see get()). If not specified, data will not be (de)serialized. Defaults to None.

  • clean (bool) – If true, all files in the store will be removed recursively

  • **storage_options – Keyword arguments passed to filesystem initializer.

get(key: str, **kwargs)Iterator[squirrel.constants.SampleType]

Yields the item with the given key.

If the store has a serializer, data read from the file will be deserialized.

Parameters
  • key (str) – Key corresponding to the item to retrieve.

  • **kwargs – Keyword arguments forwarded to self.serializer.deserialize_shard_from_file().

Yields

(Any) Item with the given key.

keys(nested: bool = False, **kwargs)Iterator[str]

Yields all shard keys in the store.

set(value: Union[squirrel.constants.SampleType, squirrel.constants.ShardType], key: Optional[str] = None, **kwargs)None

Persists a shard or sample with the given key.

Data item will be serialized before writing to a file.

Parameters
  • value (Any) – Shard or sample to be persisted. If value is a sample (i.e. not a list), it will be wrapped around with a list before persisting.

  • key (Optional[str]) – Optional key corresponding to the item to persist.

  • **kwargs – Keyword arguments forwarded to self.serializer.serialize_shard_to_file().