squirrel.zarr.group

It is advised to use SquirrelGroup over zarr.hierarchy.Group, especially when working with large datasets since the former provides performance boosts over the latter.

Module Contents

Classes

SquirrelGroup

A modified zarr.hierarchy.Group object with the following changes compared to the parent:

Functions

get_group(→ SquirrelGroup)

Constructs a zarr store with currently suggested parameters and opens it in a zarr group.

class squirrel.zarr.group.SquirrelGroup(store: collections.abc.MutableMapping, path: squirrel.constants.URL = None, read_only: bool = False, chunk_store: collections.abc.MutableMapping = None, cache_attrs: bool = True, synchronizer: zarr.sync.ThreadSynchronizer = None)

Bases: zarr.hierarchy.Group

A modified zarr.hierarchy.Group object with the following changes compared to the parent:

  • keys() method uses the ls method of the file system instance of the store (i.e. self.store.fs.ls()) to request the list of keys. This results in ~100X speedup compared to calling the keys() method of the Group directly.

  • SquirrelGroup provides the get_item() method, which takes as input a kind in addition to key and uses the kind information to bypass expensive contains() calls.

Initialize SquirrelGroup.

Parameters
  • store (MutableMapping) – Store of the group.

  • path (URL, optional) – Path to the group. Defaults to None.

  • read_only (bool, optional) – True if group should be opened in read-only mode. Defaults to False.

  • chunk_store (MutableMapping, optional) – Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata. Defaults to None.

  • cache_attrs (bool, optional) – If True, user attributes will be cached for attribute read operations. If False, user attributes are reloaded from the store prior to all attribute read operations. Defaults to True.

  • synchronizer (ThreadSynchronizer, optional) – Array synchronizer to use. Defaults to None.

Raises

Exception – If a group does not exist at the given path.

keys(prefix: str = '')Generator[str, None, None]

Returns a generator over keys one level below the provided prefix.

Note that while the return type of this function is a generator, it still fetches all keys and saves them in a list. This is because of the way that ls() method works. As a result, calling this function where there are too many keys (e.g. at the root of a zarr group as big as ImageNet) may cause memory crashes.

Parameters

prefix (str, optional) – If provided, the keys under the given prefix are returned. Defaults to “”.

Returns

A generator over keys.

Return type

Generator[str, None, None]

squirrel.zarr.group.get_group(path: squirrel.constants.URL, mode: str = 'a', overwrite: bool = False, **storage_options)SquirrelGroup

Constructs a zarr store with currently suggested parameters and opens it in a zarr group.

Default zarr.group method, when passed the parameter overwrite=False, will always return a Group with read_only=False. Here we choose not to use this zarr function but construct one by our own settings, using SquirrelGroup.

Parameters
  • path (URL) – fsspec path to store.

  • mode (str, optional) – IO mode (e.g. “r”, “w”, “a”). Defaults to “a”. mode affects the store of the returned group.

  • overwrite (bool, optional) – If True, the store is cleaned before opening. Defaults to False.

  • **storage_options – Keyword arguments passed to fsspec when obtaining a filesystem object corresponding to the given path.

Raises

ReadOnlyError – If mode == “r” and overwrite == True.

Returns

Root zarr group constructed with the given parameters, which has improved performance over a zarr.hierarchy.Group. If mode != “r” and either overwrite == True or if the group does not exist yet, first the group is initialized using zarr.storage.init_group(). Then, the group is created on the store suggested by suggested_store(). The store of the group is accessible as an attribute of the group, i.e. group.store.

Return type

SquirrelGroup