squirrel.iterstream.source

Module Contents

Classes

FilePathGenerator

A specialized version of Composable that accepts a url without instantiating a filesystem instance in the init.

IterableSamplerSource

A class that samples from iterables into an iterstream.

IterableSource

A class that turns an iterable to a source of a stream and provides stream manipulation functionalities on top,

class squirrel.iterstream.source.FilePathGenerator(url: str, nested: bool = False, max_workers: Optional[int] = None, max_keys: int = 1000000, max_dirs: int = 10, **storage_options)

Bases: squirrel.iterstream.base.Composable

A specialized version of Composable that accepts a url without instantiating a filesystem instance in the init. It simply generates directories under the given url by instantiating a fsspec filesystem and yielding the result of fs.ls(url).

Parameters
  • url – the url for which, ls is performed

  • nested – if True, it attempts to make ls on each directory that it encounters. Otherwise, it will only yields the top-level paths and will not expand if the path is a directory

  • max_workers (int) – passed to the ThreadPoolExecutor. Only applicable if nested==True

  • max_keys (int) – maximum number of keys to keep in memory at the same time. If this number is reached, no new expansion on the currently discovered directories is done, until enough keys are yielded to make room for the new ones.

  • max_dirs (int) – maximum number of parallel ls operation.

  • **storage_options (dict) – kwargs to pass onto the fsspec filesystem initialization.

__iter__()Iterator[str]

Iterator that does ls and yield filepaths under the given url

class squirrel.iterstream.source.IterableSamplerSource(iterables: List[Iterable], probs: Optional[List[float]] = None, rng: Optional[random.Random] = None, seed: Optional[int] = None)

Bases: squirrel.iterstream.base.Composable

A class that samples from iterables into an iterstream.

Initialize IterableSamplerSource.

Parameters
  • iterables (List[Iterable]) – List of iterables to sample from.

  • probs (Optional[List[float]], optional) – [description]. Defaults to None.

  • rng (random.Random, optional) – Random number generator to use.

  • seed (Optional[int]) – An int or other acceptable types that works for random.seed(). Will be used to seed rng. If None, a unique identifier will be used to seed.

__iter__()Iterator

Samples items from the iterables, returns all samples until all iterables are exhausted.

class squirrel.iterstream.source.IterableSource(source: Optional[Union[Iterable, Callable]] = ())

Bases: squirrel.iterstream.base.Composable

A class that turns an iterable to a source of a stream and provides stream manipulation functionalities on top, for instance: - map - map_async - filter - batched - shuffle - and more

For the detailed description of each, please refer to the corresponding docstring in Composable.

Initialize IterableSource.

Parameters

source (Union[Iterable, Callable], Optional) – An Iterable that the IterableSource is built based on, or a callable that generates items when called.

__iter__()Iterator

Iterates over the items in the iterable