squirrel.fsspec.fs

Module Contents

Functions

get_fs_from_url(→ squirrel.constants.FILESYSTEM)

Get filesystem suitable for a url.

get_protocol(→ str)

Get the protocol from a url, return empty string if local

squirrel.fsspec.fs.get_fs_from_url(url: squirrel.constants.URL, **storage_options)squirrel.constants.FILESYSTEM

Get filesystem suitable for a url.

Only the protocol part of the url is used, thus there is no need to call this method several times for different stores accessed with the same protocol. Should be called outside of store initialization to allow buffering from the same container among different stores or shards.

If the protocol “gs” is detected, will return a custom squirrel gcs file system instance which is more robust. For details, see squirrel.fsspec.custom_gcsfs.CustomGCSFileSystem).

Users must have one of these IAM right in the respective projects to be able to access a requester pay bucket: (Copied from https://issuetracker.google.com/issues/156960628.)

Requesters who include a billing project in their request. The project used in the request must be in good standing, and the user must have a role in the project that contains the serviceusage.services.use permission. The Editor and Owner roles contain the required permission.

Requesters who don’t include a billing project but have resourcemanager.projects.createBillingAssignment permission for the project that contains the bucket. The Billing Project Manager role contains the required permission. Access charges associated with these requests are billed to the project that contains the bucket.

If you have encountered ValueError: Bucket is requester pays. Set `requester_pays=True when creating the GCSFileSystem.` I suggest you instead of passing requester_pays=True to storage_options in fsspec, simply switch to the right project by gcloud config set project PROJECT_ID where you have the right role(s). The problem should be resolved by itself.

The protocol can be overriden via the storage_options. This is important if people want to use the fsspec caching functionality, which requires the protocol to be, e.g., simplecache.

squirrel.fsspec.fs.get_protocol(url: str)str

Get the protocol from a url, return empty string if local