Data Storage

PiCloud offers two ways to store your data: buckets and volumes. If neither suit your needs, or you don’t want to change the way you’re currently doing things, you can always use your own data storage solution.

Why Store Data on the Cloud?

There are two main reasons to store data on the cloud:

  1. cloud.call and picloud exec can both only send 16 MB max of payload per job. If your job needs more data, you need to store it somewhere, and your job should be designed to access it from there.
  2. Sending the same dataset from your machine to the cloud with every single cloud.call or picloud exec is inefficient with regards to time spent uploading and bandwidth usage. Put the data on the cloud once, and then your jobs will be able to access it while they’re running there. As Local vs. Cloud Performance of buckets shows, there can be a 10x+ speed difference.

Volume

A volume gives you a way to synchronize directories on your machine to the cloud, and is exposed to your job as a directory on the filesystem.

Bucket v. Volume Comparison

Key differences to consider:

  • Buckets use a simple key->value interface, making it easy to get started, and intuitive to use.
  • Bucket objects can be made public, and shared via a public URL.
  • Buckets store and retrieve single files much faster.
  • Volumes can efficiently synchronize a full directory tree of data.
  • Volumes appear to a job as part of the filesystem; great for programs that expect files.
  • If only 64KB of a file in a volume is read by a job, only that portion is sent over the network.

Use Your Own

If you have your own datastore of choice, read here how to get it to work with PiCloud.