squirrel.driver.driver
¶
This module defines the Driver API of squirrel.
Module Contents¶
Classes¶
Drives the access to a data source. |
|
Drives the access to a data source. |
|
A Driver that allows iteration over the items in the data source. |
|
A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items. |
-
class
squirrel.driver.driver.
DataFrameDriver
(catalog: Optional[squirrel.catalog.Catalog] = None, **kwargs)¶ Bases:
Driver
Drives the access to a data source.
Initializes driver with a catalog and arbitrary kwargs.
-
abstract
get_df
(self, **kwargs) → pandas.DataFrame¶ Returns a dataframe of the data.
-
abstract
-
class
squirrel.driver.driver.
Driver
(catalog: Optional[squirrel.catalog.Catalog] = None, **kwargs)¶ Bases:
abc.ABC
Drives the access to a data source.
Initializes driver with a catalog and arbitrary kwargs.
-
name
:str¶
-
-
class
squirrel.driver.driver.
IterDriver
(catalog: Optional[squirrel.catalog.Catalog] = None, **kwargs)¶ Bases:
Driver
A Driver that allows iteration over the items in the data source.
Items can be iterated over using the
get_iter()
method.Initializes driver with a catalog and arbitrary kwargs.
-
abstract
get_iter
(self, **kwargs) → squirrel.iterstream.Composable¶ Returns an iterable of items in the form of a
Composable
, which allows various stream manipulation functionalities.The order of the items in the iterable may or may not be randomized, depending on the implementation and kwargs.
-
abstract
-
class
squirrel.driver.driver.
MapDriver
(catalog: Optional[squirrel.catalog.Catalog] = None, **kwargs)¶ Bases:
IterDriver
A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items.
Initializes driver with a catalog and arbitrary kwargs.
-
abstract
get
(self, key: Any, **kwargs) → Any¶ Returns an iterable over the items corresponding to key.
Note that it is possible to implement this method according to your needs. There is no restriction on the type or number of items. For example, a key might be corresponding to a single item that holds a single sample or to a single item that contains one shard of multiple samples.
If the method returns a single sample, then the
get_iter()
method should be called with flatten=False since the stream does not need to be flattened. Otherwise, e.g. if the method returns an iterable of samples, then theget_iter()
method should be called with flatten=True if it desirable to have individual samples in the iterstream.
-
get_iter
(self, keys_iterable: Optional[Iterable] = None, shuffle_key_buffer: int = 1, key_hooks: Optional[Iterable[Union[Callable, Type[squirrel.iterstream.Composable], functools.partial]]] = None, max_workers: Optional[int] = None, prefetch_buffer: int = 10, shuffle_item_buffer: int = 1, flatten: bool = False, keys_kwargs: Optional[Dict] = None, get_kwargs: Optional[Dict] = None, key_shuffle_kwargs: Optional[Dict] = None, item_shuffle_kwargs: Optional[Dict] = None) → squirrel.iterstream.Composable¶ Returns an iterable of items in the form of a
squirrel.iterstream.Composable
, which allows various stream manipulation functionalities.Items are fetched using the
get()
method. The returnedComposable
iterates over the items in the order of the keys returned by thekeys()
method.- Parameters
keys_iterable (Iterable, optional) – If provided, only the keys in keys_iterable will be used to fetch items. If not provided, all keys in the store are used.
shuffle_key_buffer (int) – Size of the buffer used to shuffle keys.
key_hooks (Iterable[Iterable[Union[Callable, Type[Composable], functools.partial]]], optional) –
Hooks to apply to keys before fetching the items. It is an Iterable any of these objects:
1) subclass of
Composable()
: in this case, .compose(hook, **kw) will be applied to the stream 2) A Callable: .to(hook, **kw) will be applied to the stream 3) A partial function: the three attributes args, keywords and func will be retrieved, and depending on whether func is a subclass ofComposable()
or a Callable, one of the above cases will happen, with the only difference that arguments are passed too. This is useful for passing arguments.max_workers (int, Optional) – If larger than 1 or None,
async_map()
is called to fetch multiple items simultaneously and max_workers refers to the maximum number of workers in the ThreadPoolExecutor used by async_map. Otherwise,map()
is called and max_workers is not used. Defaults to None.prefetch_buffer (int) – Size of the buffer used for prefetching items if async_map is used. See max_workers for more details. Please be aware of the memory footprint when setting this parameter.
shuffle_item_buffer (int) – Size of the buffer used to shuffle items after being fetched. Please be aware of the memory footprint when setting this parameter.
flatten (bool) – Whether to flatten the returned iterable. Defaults to False.
keys_kwargs (Dict, optional) – Keyword arguments passed to
keys()
when getting the keys in the store. Not used if keys_iterable is provided. Defaults to None.get_kwargs (Dict, optional) – Keyword arguments passed to
get()
when fetching items. Defaults to None.key_shuffle_kwargs (Dict, optional) – Keyword arguments passed to
shuffle()
when shuffling keys. Defaults to None. Can be useful to e.g. set the seed etc.item_shuffle_kwargs (Dict, optional) – Keyword arguments passed to
shuffle()
when shuffling items. Defaults to None. Can be useful to e.g. set the seed etc.
- Returns
(squirrel.iterstream.Composable) Iterable over the items in the store.
-
abstract
keys
(self, **kwargs) → Iterable¶ Returns an iterable of the keys for the objects that are obtainable through the driver.
-
abstract