squirrel.driver.driver

This module defines the Driver API of squirrel.

Module Contents

Classes

Driver

Drives the access to a data source.

IterDriver

A Driver that allows iteration over the items in the data source.

MapDriver

A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items.

class squirrel.driver.driver.Driver(catalog: Catalog | None = None, **kwargs)

Bases: abc.ABC

Drives the access to a data source.

Initializes driver with a catalog and arbitrary kwargs.

name :str
class squirrel.driver.driver.IterDriver(catalog: Catalog | None = None, **kwargs)

Bases: Driver

A Driver that allows iteration over the items in the data source.

Items can be iterated over using the get_iter() method.

Initializes driver with a catalog and arbitrary kwargs.

abstract get_iter(**kwargs)squirrel.iterstream.Composable

Returns an iterable of items in the form of a Composable, which allows various stream manipulation functionalities.

The order of the items in the iterable may or may not be randomized, depending on the implementation and kwargs.

class squirrel.driver.driver.MapDriver(catalog: Catalog | None = None, **kwargs)

Bases: IterDriver

A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items.

Initializes driver with a catalog and arbitrary kwargs.

abstract get(key: Any, **kwargs)Any

Returns an iterable over the items corresponding to key.

Note that it is possible to implement this method according to your needs. There is no restriction on the type or number of items. For example, a key might be corresponding to a single item that holds a single sample or to a single item that contains one shard of multiple samples.

If the method returns a single sample, then the get_iter() method should be called with flatten=False since the stream does not need to be flattened. Otherwise, e.g. if the method returns an iterable of samples, then the get_iter() method should be called with flatten=True if it desirable to have individual samples in the iterstream.

get_iter(keys_iterable: Iterable | None = None, shuffle_key_buffer: int = 1, key_hooks: Iterable[Callable | type[Composable] | functools.partial] | None = None, max_workers: int | None = None, prefetch_buffer: int = 10, shuffle_item_buffer: int = 1, flatten: bool = False, keys_kwargs: dict | None = None, get_kwargs: dict | None = None, key_shuffle_kwargs: dict | None = None, item_shuffle_kwargs: dict | None = None)squirrel.iterstream.Composable

Returns an iterable of items in the form of a squirrel.iterstream.Composable, which allows various stream manipulation functionalities.

Items are fetched using the get() method. The returned Composable iterates over the items in the order of the keys returned by the keys() method.

Parameters
  • keys_iterable (Iterable, optional) – If provided, only the keys in keys_iterable will be used to fetch items. If not provided, all keys in the store are used.

  • shuffle_key_buffer (int) – Size of the buffer used to shuffle keys.

  • key_hooks (Iterable[Iterable[Union[Callable, Type[Composable], functools.partial]]], optional) –

    Hooks to apply to keys before fetching the items. It is an Iterable any of these objects:

    1. subclass of Composable(): in this case, .compose(hook, **kw) will be applied to the stream

    2. A Callable: .to(hook, **kw) will be applied to the stream

    3. A partial function: the three attributes args, keywords and func will be retrieved, and depending on whether func is a subclass of Composable() or a Callable, one of the above cases will happen, with the only difference that arguments are passed too. This is useful for passing arguments.

  • max_workers (int, optional) – If max_workers is equal to 0 or 1, map() is called to fetch the items iteratively. If max_workers is bigger than 1 or equal to None, async_map() is called to fetch multiple items simultaneously. In this case, the max_workers argument refers to the maximum number of workers in the ThreadPoolExecutor Pool of async_map(). None has a special meaning in this context and uses an internal heuristic for the number of workers. The exact number of workers with max_workers=None depends on the specific Python version. See ThreadPoolExecutor for details. Defaults to None.

  • prefetch_buffer (int) – Size of the buffer used for prefetching items if async_map is used. See max_workers for more details. Please be aware of the memory footprint when setting this parameter.

  • shuffle_item_buffer (int) – Size of the buffer used to shuffle items after being fetched. Please be aware of the memory footprint when setting this parameter.

  • flatten (bool) – Whether to flatten the returned iterable. Defaults to False.

  • keys_kwargs (Dict, optional) – Keyword arguments passed to keys() when getting the keys in the store. Not used if keys_iterable is provided. Defaults to None.

  • get_kwargs (Dict, optional) – Keyword arguments passed to get() when fetching items. Defaults to None.

  • key_shuffle_kwargs (Dict, optional) – Keyword arguments passed to shuffle() when shuffling keys. Defaults to None. Can be useful to e.g. set the seed etc.

  • item_shuffle_kwargs (Dict, optional) – Keyword arguments passed to shuffle() when shuffling items. Defaults to None. Can be useful to e.g. set the seed etc.

Returns

(squirrel.iterstream.Composable) Iterable over the items in the store.

abstract keys(**kwargs)Iterable

Returns an iterable of the keys for the objects that are obtainable through the driver.