squirrel.store.squirrel_store
¶
Module Contents¶
Classes¶
FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples). |
|
FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples). |
-
class
squirrel.store.squirrel_store.
CacheStore
(url: str, serializer: squirrel.serialization.SquirrelSerializer, cache_url: str, clean: bool = False, cash_storage_options: dict[str, t.Any] | None = None, **storage_options)¶ Bases:
SquirrelStore
FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).
Maintains a cache of the original data which is populated on the fly as the data is retrieved from the main store. The get() method will fetch the entire shard and stores it in the cache directory with the same key and serialization protocol, and then yields the samples within that shard. If the cache already exists, it streams the data from the cache.
Note: the entire shard should fit in memory. Note: there is an overhead for caching the data in the first iteration which should be amortized over the multiple iterations.
-
get
(key: str, **kwargs) → Iterator[squirrel.constants.SampleType]¶ If the item is cached, read from cache, otherwise read from the original source, cache it and stream the items from the shard
-
-
class
squirrel.store.squirrel_store.
SquirrelStore
(url: str, serializer: squirrel.serialization.SquirrelSerializer, clean: bool = False, **storage_options)¶ Bases:
squirrel.store.filesystem.FilesystemStore
FilesystemStore that persist samples (Dict objects) or shards (i.e. list of samples).
Initializes SquirrelStore.
- Parameters
url (str) – Path to the root directory. If this path does not exist, it will be created.
serializer (SquirrelSerializer) – Serializer that is used to serialize data before persisting (see
set()
) and to deserialize data after reading (seeget()
). If not specified, data will not be (de)serialized. Defaults to None.clean (bool) – If true, all files in the store will be removed recursively
**storage_options – Keyword arguments passed to filesystem initializer.
-
get
(key: str, **kwargs) → Iterator[squirrel.constants.SampleType]¶ Yields the item with the given key.
If the store has a serializer, data read from the file will be deserialized.
- Parameters
key (str) – Key corresponding to the item to retrieve.
**kwargs – Keyword arguments forwarded to
self.serializer.deserialize_shard_from_file()
.
- Yields
(Any) Item with the given key.
-
set
(value: Union[squirrel.constants.SampleType, squirrel.constants.ShardType], key: Optional[str] = None, **kwargs) → None¶ Persists a shard or sample with the given key.
Data item will be serialized before writing to a file.
- Parameters
value (Any) – Shard or sample to be persisted. If value is a sample (i.e. not a list), it will be wrapped around with a list before persisting.
key (Optional[str]) – Optional key corresponding to the item to persist.
**kwargs – Keyword arguments forwarded to
self.serializer.serialize_shard_to_file()
.