squirrel.driver.source_combiner
¶
Module Contents¶
Classes¶
A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items. |
-
class
squirrel.driver.source_combiner.
SourceCombiner
(subsets: dict[str, squirrel.catalog.CatalogKey], catalog: squirrel.catalog.Catalog, **kwargs)¶ Bases:
squirrel.driver.driver.MapDriver
A Driver that allows retrieval of items using keys, in addition to allowing iteration over the items.
Initializes SourceCombiner.
- Parameters
subsets (Dict[str, CatalogKey]) – Keys define the names of the subsets, values are tuples of the corresponding (catalog entry, version) combinations.
catalog (Catalog) – The parent catalog which the subset sources are part of.
**kwargs – Keyword arguments to be passed to the super class.
-
name
= source_combiner¶
-
get
(subset: str, key: Any, **kwargs) → Iterable¶ Routes to the
get()
method of the appropriate subset driver.
-
get_df
(subset: str, **kwargs) → dask.dataframe.DataFrame¶ Routes to the
get_df()
method of the appropriate subset driver.- Parameters
subset (str) – Id of the subset in this source definition.
**kwargs – Keyword arguments passed to the subset driver.
- Returns
(DataFrame) Data of the subset driver subset as a Dask or Pandas DataFrame.
-
get_iter
(subset: str | None = None, **kwargs) → squirrel.iterstream.Composable¶ Routes to the
get_iter()
method of the appropriate subset driver.- Parameters
subset (str) – Id of the subset in this source definition. If None, interleaves iterables obtained from all subset drivers.
**kwargs – Keyword arguments passed to the subset driver.
- Returns
(Composable) Iterable over the items of subset driver(s) in the form of a
Composable
.
-
get_iter_sampler
(probs: list[float] | None = None, rng: random.Random | None = None, seed: int | None = None, **kwargs) → squirrel.iterstream.Composable¶ Returns an iterstream that samples from the subsets of this source.
- Parameters
rng (random.Random) – A random number generator.
probs (List[float]) – List of probabilities to sample from the subsets. If None, sample uniform.
**kwargs – Keyword arguments passed to the
get_iter()
method of each subset driver.
- Returns
(Composable) Iterable over samples randomly sampled from subsets.
-
get_source
(subset: str) → squirrel.catalog.source.Source¶ Returns subset source based on subset id.
-
get_store
(subset: str) → squirrel.store.AbstractStore¶ Returns the store of the appropriate subset driver.
- Parameters
subset (str) – Id of the subset in this source definition.
- Returns
(AbstractStore) Store of the subset driver subset.