squirrel.iterstream.source
¶
Module Contents¶
Classes¶
A specialized version of IterableSource that accepts a url without instantiating it eagerly. It simply generates |
|
A class that samples from iterables into an iterstream. |
|
A class that turns an iterable to a source of a stream and provides stream manipulation functionalities on top, |
-
class
squirrel.iterstream.source.
FilePathGenerator
(url: str, nested: bool = False, max_workers: Optional[int] = None, max_keys: int = 1000000, max_dirs: int = 10)¶ Bases:
IterableSource
A specialized version of IterableSource that accepts a url without instantiating it eagerly. It simply generates directories under the given url by instantiating a fsspec filesystem and yielding the result of fs.ls(url).
- Parameters
url – the url for which, ls is performed
nested – if True, it attempts to make ls on each directory that it encounters. Otherwise, it will only yields the top-level paths and will not expand if the path is a directory
max_workers (int) – passed to the ThreadPoolExecutor. Only applicable if nested==True
max_keys (int) – maximum number of keys to keep in memory at the same time. If this number is reached, no new expansion on the currently discovered directories is done, until enough keys are yielded to make room for the new ones.
max_dirs (int) – maximum number of parallel ls operation.
-
class
squirrel.iterstream.source.
IterableSamplerSource
(iterables: List[Iterable], probs: Optional[List[float]] = None, rng: Optional[random.Random] = None, seed: Optional[int] = None)¶ Bases:
squirrel.iterstream.base.Composable
A class that samples from iterables into an iterstream.
Initialize IterableSamplerSource.
- Parameters
iterables (List[Iterable]) – List of iterables to sample from.
probs (Optional[List[float]], optional) – [description]. Defaults to None.
rng (random.Random, optional) – Random number generator to use.
seed (Optional[int]) – An int or other acceptable types that works for random.seed(). Will be used to seed rng. If None, a unique identifier will be used to seed.
-
__iter__
(self) → Iterator¶ Samples items from the iterables, returns all samples until all iterables are exhausted.
-
class
squirrel.iterstream.source.
IterableSource
(source: Iterable = ())¶ Bases:
squirrel.iterstream.base.Composable
A class that turns an iterable to a source of a stream and provides stream manipulation functionalities on top, for instance: - map - map_async - filter - batched - shuffle - and more
For the detailed description of each, please refer to the corresponding docstring in
Composable
.Initialize IterableSource.
- Parameters
source (Iterable) – An Iterable that the IterableSource is built based on.
-
__iter__
(self) → Iterator¶ Iterates over the items in the iterable