squirrel.zarr.group
¶
It is advised to use SquirrelGroup
over zarr.hierarchy.Group
, especially
when working with large datasets since the former provides performance boosts over the latter.
Module Contents¶
Classes¶
A modified |
Functions¶
|
Constructs a zarr store with currently suggested parameters and opens it in a zarr group. |
-
class
squirrel.zarr.group.
SquirrelGroup
(store: collections.MutableMapping, path: squirrel.constants.URL = None, read_only: bool = False, chunk_store: collections.MutableMapping = None, cache_attrs: bool = True, synchronizer: zarr.sync.ThreadSynchronizer = None)¶ Bases:
zarr.hierarchy.Group
A modified
zarr.hierarchy.Group
object with the following changes compared to the parent:keys()
method uses the ls method of the file system instance of the store (i.e. self.store.fs.ls()) to request the list of keys. This results in ~100X speedup compared to calling the keys() method of the Group directly.SquirrelGroup provides the
get_item()
method, which takes as input a kind in addition to key and uses the kind information to bypass expensive contains() calls.
Initialize SquirrelGroup.
- Parameters
store (MutableMapping) – Store of the group.
path (URL, optional) – Path to the group. Defaults to None.
read_only (bool, optional) – True if group should be opened in read-only mode. Defaults to False.
chunk_store (MutableMapping, optional) – Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata. Defaults to None.
cache_attrs (bool, optional) – If True, user attributes will be cached for attribute read operations. If False, user attributes are reloaded from the store prior to all attribute read operations. Defaults to True.
synchronizer (ThreadSynchronizer, optional) – Array synchronizer to use. Defaults to None.
- Raises
Exception – If a group does not exist at the given path.
-
keys
(self, prefix: str = '') → Generator[str, None, None]¶ Returns a generator over keys one level below the provided prefix.
Note that while the return type of this function is a generator, it still fetches all keys and saves them in a list. This is because of the way that ls() method works. As a result, calling this function where there are too many keys (e.g. at the root of a zarr group as big as ImageNet) may cause memory crashes.
-
squirrel.zarr.group.
get_group
(path: squirrel.constants.URL, mode: str = 'a', overwrite: bool = False, **storage_options) → SquirrelGroup¶ Constructs a zarr store with currently suggested parameters and opens it in a zarr group.
Default zarr.group method, when passed the parameter overwrite=False, will always return a Group with read_only=False. Here we choose not to use this zarr function but construct one by our own settings, using
SquirrelGroup
.- Parameters
path (URL) – fsspec path to store.
mode (str, optional) – IO mode (e.g. “r”, “w”, “a”). Defaults to “a”. mode affects the store of the returned group.
overwrite (bool, optional) – If True, the store is cleaned before opening. Defaults to False.
**storage_options – Keyword arguments passed to fsspec when obtaining a filesystem object corresponding to the given path.
- Raises
ReadOnlyError – If mode == “r” and overwrite == True.
- Returns
Root zarr group constructed with the given parameters, which has improved performance over a
zarr.hierarchy.Group
. If mode != “r” and either overwrite == True or if the group does not exist yet, first the group is initialized usingzarr.storage.init_group()
. Then, the group is created on the store suggested bysuggested_store()
. The store of the group is accessible as an attribute of the group, i.e. group.store.- Return type