PiCloud offers two ways to store your data: buckets and volumes. If neither suit your needs, or you don’t want to change the way you’re currently doing things, you can always use your own data storage solution.
There are two main reasons to store data on the cloud:
- cloud.call and picloud exec can both only send 16 MB max of payload per job. If your job needs more data, you need to store it somewhere, and your job should be designed to access it from there.
- Sending the same dataset from your machine to the cloud with every single cloud.call or picloud exec is inefficient with regards to time spent uploading and bandwidth usage. Put the data on the cloud once, and then your jobs will be able to access it while they’re running there. As Local vs. Cloud Performance of buckets shows, there can be a 10x+ speed difference.
A bucket gives you key->value storage interface, and is intended for data objects up to 5 GB in size.
A volume gives you a way to synchronize directories on your machine to the cloud, and is exposed to your job as a directory on the filesystem.
Key differences to consider:
- Buckets use a simple key->value interface, making it easy to get started, and intuitive to use.
- Bucket objects can be made public, and shared via a public URL.
- Buckets store and retrieve single files much faster.
- Volumes can efficiently synchronize a full directory tree of data.
- Volumes appear to a job as part of the filesystem; great for programs that expect files.
- If only 64KB of a file in a volume is read by a job, only that portion is sent over the network.