squirrel.store
¶
Package Contents¶
Classes¶
Abstract class that specifies the map-style storage api of squirrel. |
|
Store that uses fsspec to read from / write to files. |
|
FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples). |
-
class
squirrel.store.
AbstractStore
¶ Bases:
abc.ABC
Abstract class that specifies the map-style storage api of squirrel.
A store is responsible for persisting and retrieving back items. Each item is persisted with a corresponding key using the
set()
method. Then, the item can be retrieved using theget()
method. Keys to all persisted items can be retrieved using thekeys()
method.The meaning or content of a single item depends on the use case and the implementation of the store. For most machine-learning-related tasks, an item would be a single sample, or a batch or shard of samples.
-
abstract
get
(key: Any, **kwargs) → Iterable¶ Returns an iterable over the item(s) corresponding to the given key.
Note that it is possible to implement this method as a generator, or to simply return a list of items. There is no restriction on the type or number of items. For example, a key might be corresponding to a single item that holds a single sample or to a single item that contains one shard of multiple samples.
-
abstract
keys
(**kwargs) → Iterable¶ Returns an iterable over all keys in the store.
Note that it is possible to implement this method as a generator, or to simply return a list of keys. The iterable should contain the keys for all items retrievable using
get()
.
-
abstract
set
(key: Any, value: Any, **kwargs) → None¶ Persists an item with the given key.
There is no restriction on the type of the item. However, implementations of
AbstractStore
can put their own restrictions.
-
abstract
-
class
squirrel.store.
FilesystemStore
(url: str, serializer: Optional[squirrel.serialization.SquirrelSerializer] = None, clean: bool = False, **storage_options)¶ Bases:
squirrel.store.store.AbstractStore
Store that uses fsspec to read from / write to files.
Initializes FilesystemStore.
- Parameters
url (str) – Path to the root directory. If this path does not exist, it will be created.
serializer (SquirrelSerializer, optional) – Serializer that is used to serialize data before persisting (see
set()
) and to deserialize data after reading (seeget()
). If not specified, data will not be (de)serialized. Defaults to None.clean (bool) – If true, all files in the store will be removed recursively
**storage_options – Keyword arguments passed to filesystem initializer.
-
get
(key: str, mode: str = 'rb', **open_kwargs) → Any¶ Yields the item with the given key.
If the store has a serializer, data read from the file will be deserialized.
-
keys
(nested: bool = True, **kwargs) → Iterator[str]¶ Yields all paths in the store, relative to the root directory.
Paths are generated using
squirrel.iterstream.source.FilePathGenerator
.- Parameters
nested (bool) – Whether to return paths that are not direct children of the root directory. If True, all paths in the store will be yielded. Otherwise, only the top-level paths (i.e. direct children of the root path) will be yielded. This option is passed to FilePathGenerator initializer. Defaults to True.
**kwargs – Other keyword arguments passed to the FilePathGenerator initializer. If a key is present in both kwargs and self.storage_options, the value from kwargs will be used.
- Yields
(str) Paths to files and directories in the store relative to the root directory.
-
class
squirrel.store.
SquirrelStore
(url: str, serializer: squirrel.serialization.SquirrelSerializer, clean: bool = False, **storage_options)¶ Bases:
squirrel.store.filesystem.FilesystemStore
FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).
Initializes SquirrelStore.
- Parameters
url (str) – Path to the root directory. If this path does not exist, it will be created.
serializer (SquirrelSerializer) – Serializer that is used to serialize data before persisting (see
set()
) and to deserialize data after reading (seeget()
). If not specified, data will not be (de)serialized. Defaults to None.clean (bool) – If true, all files in the store will be removed recursively
**storage_options – Keyword arguments passed to filesystem initializer.
-
get
(key: str, **kwargs) → Iterator[squirrel.constants.SampleType]¶ Yields the item with the given key.
If the store has a serializer, data read from the file will be deserialized.
- Parameters
key (str) – Key corresponding to the item to retrieve.
**kwargs – Keyword arguments forwarded to
self.serializer.deserialize_shard_from_file()
.
- Yields
(Any) Item with the given key.
-
set
(value: Union[squirrel.constants.SampleType, squirrel.constants.ShardType], key: Optional[str] = None, **kwargs) → None¶ Persists a shard or sample with the given key.
Data item will be serialized before writing to a file.
- Parameters
value (Any) – Shard or sample to be persisted. If value is a sample (i.e. not a list), it will be wrapped around with a list before persisting.
key (Optional[str]) – Optional key corresponding to the item to persist.
**kwargs – Keyword arguments forwarded to
self.serializer.serialize_shard_to_file()
.