squirrel.driver.source_combiner

Module Contents

Classes

SourceCombiner

A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items.

class squirrel.driver.source_combiner.SourceCombiner(subsets: dict[str, squirrel.catalog.CatalogKey], catalog: squirrel.catalog.Catalog, **kwargs)

Bases: squirrel.driver.driver.MapDriver

A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items.

Initializes SourceCombiner.

Parameters
  • subsets (Dict[str, CatalogKey]) – Keys define the names of the subsets, values are tuples of the corresponding (catalog entry, version) combinations.

  • catalog (Catalog) – The parent catalog which the subset sources are part of.

  • **kwargs – Keyword arguments to be passed to the super class.

name = source_combiner
get(subset: str, key: Any, **kwargs)Iterable

Routes to the get() method of the appropriate subset driver.

Parameters
  • subset (str) – Id of the subset in this source definition.

  • key (str) – Key of the item to get.

  • **kwargs – Keyword arguments passed to the subset driver.

Returns

(Iterable) Iterable over the items corresponding to key for subset driver subset.

get_df(subset: str, **kwargs)dask.dataframe.DataFrame

Routes to the get_df() method of the appropriate subset driver.

Parameters
  • subset (str) – Id of the subset in this source definition.

  • **kwargs – Keyword arguments passed to the subset driver.

Returns

(DataFrame) Data of the subset driver subset as a Dask or Pandas DataFrame.

get_iter(subset: str | None = None, **kwargs)squirrel.iterstream.Composable

Routes to the get_iter() method of the appropriate subset driver.

Parameters
  • subset (str) – Id of the subset in this source definition. If None, interleaves iterables obtained from all subset drivers.

  • **kwargs – Keyword arguments passed to the subset driver.

Returns

(Composable) Iterable over the items of subset driver(s) in the form of a Composable.

get_iter_sampler(probs: list[float] | None = None, rng: random.Random | None = None, seed: int | None = None, **kwargs)squirrel.iterstream.Composable

Returns an iterstream that samples from the subsets of this source.

Parameters
  • rng (random.Random) – A random number generator.

  • probs (List[float]) – List of probabilities to sample from the subsets. If None, sample uniform.

  • **kwargs – Keyword arguments passed to the get_iter() method of each subset driver.

Returns

(Composable) Iterable over samples randomly sampled from subsets.

get_source(subset: str)squirrel.catalog.source.Source

Returns subset source based on subset id.

Parameters

subset (str) – Id of subset in this source definition.

Returns

Subset source.

Return type

(Source)

get_store(subset: str)squirrel.store.AbstractStore

Returns the store of the appropriate subset driver.

Parameters

subset (str) – Id of the subset in this source definition.

Returns

(AbstractStore) Store of the subset driver subset.

keys(subset: str, **kwargs)Iterable

Routes to the keys() method of the appropriate subset driver.

Parameters
  • subset (str) – Id of the subset in this source definition.

  • **kwargs – Keyword arguments passed to the subset driver.

Returns

(Iterable) Iterable over the keys for subset driver subset.

property subsetslist[str]

Ids of all subsets defined by this source.