quantify_randomness

Module Contents

Classes

DummyShardedDriver

Return integer elements in shards

Functions

kendalltau_metric(→ float)

Compute the kendall tau randomness metric

quantify_randomness(→ float)

Quantify the randomness of sampling from a driver with the given shuffle parameters.

class quantify_randomness.DummyShardedDriver(num_shard: int, shard_size: int)

Bases: squirrel.driver.MapDriver

Return integer elements in shards

Init dummy sharded driver

name = dummy_sharded_driver
get(key: str)int

Get item with key

get_iter(flatten: bool = True, **kwargs)squirrel.iterstream.base.Composable

Get iterator

keys()Iterable

Get key iterator

quantify_randomness.kendalltau_metric(result1: numpy.array, result2: numpy.array)float

Compute the kendall tau randomness metric

quantify_randomness.quantify_randomness(num_shard: int, shard_size: int, buffer_size: int, initial: int, n_samples: int = 250, metric: Callable = kendalltau_metric, seed1: squirrel.constants.SeedType = None, seed2: squirrel.constants.SeedType = None)float

Quantify the randomness of sampling from a driver with the given shuffle parameters. This function assumes that we always fully shuffle all keys and the parameters for the item buffer is what we are interested in.

Parameters
  • num_shard (int) – number of shards

  • shard_size (int) – size of each shard assuming that all shards are of equal size

  • buffer_size (int) – buffer size for item shuffle buffer

  • initial (int) – initial size of item shuffle buffer

  • n_samples (int) – influences the accuracy of the estimate by controlling the number of sampled trajectories

  • metric (Callable) – how to measure the distance

  • seed1 (SeedType) – seed for the first trajectory

  • seed2 (SeedType) – seed for the second trajectory

Returns

randomness measure computed from the kendall tau coefficient. Values between 0 and 1 while 1 means

completely deterministic and 0 means random.

Return type

float