squirrel.store.squirrel_store

Module Contents

Classes

CacheStore

FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).

SquirrelStore

FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).

class squirrel.store.squirrel_store.CacheStore(url: str, serializer: squirrel.serialization.SquirrelSerializer, cache_url: str, clean: bool = False, cash_storage_options: dict[str, t.Any] | None = None, **storage_options)

Bases: SquirrelStore

FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).

Maintains a cache of the original data which is populated on the fly as the data is retrieved from the main store. The get() method will fetch the entire shard and stores it in the cache directory with the same key and serialization protocol, and then yields the samples within that shard. If the cache already exists, it streams the data from the cache.

Note: the entire shard should fit in memory. Note: there is an overhead for caching the data in the first iteration which should be amortized over the multiple iterations.

get(key: str, **kwargs)Iterator[squirrel.constants.SampleType]

If the item is cached, read from cache, otherwise read from the original source, cache it and stream the items from the shard

class squirrel.store.squirrel_store.SquirrelStore(url: str, serializer: squirrel.serialization.SquirrelSerializer, clean: bool = False, **storage_options)

Bases: squirrel.store.filesystem.FilesystemStore

FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).

Initializes SquirrelStore.

Parameters
  • url (str) – Path to the root directory. If this path does not exist, it will be created.

  • serializer (SquirrelSerializer) – Serializer that is used to serialize data before persisting (see set()) and to deserialize data after reading (see get()). If not specified, data will not be (de)serialized. Defaults to None.

  • clean (bool) – If true, all files in the store will be removed recursively

  • **storage_options – Keyword arguments passed to filesystem initializer.

get(key: str, **kwargs)Iterator[squirrel.constants.SampleType]

Yields the item with the given key.

If the store has a serializer, data read from the file will be deserialized.

Parameters
  • key (str) – Key corresponding to the item to retrieve.

  • **kwargs – Keyword arguments forwarded to self.serializer.deserialize_shard_from_file().

Yields

(Any) Item with the given key.

keys(nested: bool = False, **kwargs)Iterator[str]

Yields all shard keys in the store.

set(value: Union[squirrel.constants.SampleType, squirrel.constants.ShardType], key: Optional[str] = None, **kwargs)None

Persists a shard or sample with the given key.

Data item will be serialized before writing to a file.

Parameters
  • value (Any) – Shard or sample to be persisted. If value is a sample (i.e. not a list), it will be wrapped around with a list before persisting.

  • key (Optional[str]) – Optional key corresponding to the item to persist.

  • **kwargs – Keyword arguments forwarded to self.serializer.serialize_shard_to_file().