Some quick terms:
Pythons 2.6 or 2.7. with a correctly installed cloud module.
Only the standard implementation of Python, CPython, is officially supported by PiCloud.
cloud.call() allows the client to run functions on PiCloud. For instance, a function named add can be run on the PiCloud servers merely by invoking cloud.call(add). Arguments, e.g. add(1,2) can be passed by listing them after the function, e.g. cloud.call(add,1,2) or by naming them, e.g express add(x=1, y=2) as cloud.call(add, x=1, y=2). After add completes, its return value is saved and can be accessed via cloud.result().
Upon invoking cloud.call(), the PiCloud server will create a job which runs the desired function. cloud.call() is non-blocking; its return value is an integer jid (Job IDentification). This jid can be used to access information about the job through the cloud module, as well as the PiCloud web interface. The below diagram represents this example:
Because cloud.call() is non-blocking, all jobs run on PiCloud’s servers in parallel with each other and the client. As jobs run, the client can continue to create more jobs with cloud.call(). Consequently, cloud.call() allows for easily realized course-grain parallelism.
Functions that run on the PiCloud servers can do potentially anything. They can even open up arbitrary connections to download documents, access databases, post data to websites, etc.
All relevant information needed to execute your function (code, global variables, class variables, etc.) is transmitted to the PiCloud server. Most users will find that any function they’ve written can be passed through cloud.call(). See the Limitations section for more information.
Consider using cloud.map if you are generating many jobs in parallel with the same function, but different arguments.
There is some overhead on cloud calls. For PiCloud to be worthwhile, your function should take at least a tenth of a second to execute. See the PiCloud examples section for proper design patterns.
cloud.call() may be invoked within a function running on PiCloud, allowing jobs to generate new jobs.
To access the status of the created job, you will need the jid returned by cloud.call(). Job meta-data is purged very infrequently by PiCloud, so the jid can be placed safely into a database and checked days later.
Use cloud.status(jid), where jid was returned by an earlier invoked cloud.call() to get a job’s status. Like every other job-related function, cloud.status() also accepts a sequence of jids; if given a sequence, a list of statuses is returned corresponding to the requested jids.
The possible statuses are:
|queued||Job is in a queue on the server waiting to be run.|
|processing||Job is running.|
|waiting||Job is waiting until its dependencies are satisfied.|
|done||Job completed successfully.|
|error||Job errored (typically due to an uncaught exception).|
|killed||Job was aborted by the user.|
|stalled||Job will not run due to a dependency erroring.|
Use cloud.join(jid) to block until the job with jid jid has a “complete” status (done, error, killed, or stalled). If the job errors, a cloud.CloudException will be thrown with a traceback indicating what went wrong on the server.
cloud.join() will also accept a sequence of jids and will block until all corresponding jobs complete. If a job is detected to have errored, an exception may be thrown before all requested jobs complete. If multiple jobs error, it is undefined which errored job the exception will describe. If the ignore_errors argument is set to True, no exception is thrown when jobs error, and join will block until every job has a “complete” status.
An optional timeout may be set. If the join takes longer than timeout seconds, it will abort by throwing a cloud.CloudTimeoutError.
Use cloud.result() function to access the return value of the function that ran on PiCloud. This function blocks (with cloud.join()) until the job has completed. Like cloud.join(), any job erroring will result in an exception being thrown. The below diagram represents the earlier described case of add:
cloud.result() also accepts a sequence of jids; if given a sequence, a list of return values is returned corresponding to the requested jids.
As with join, a timeout may be set.
If ignore_errors is set, no exceptions are thrown when encountering errored jobs. The result of an errored job is the cloud.CloudException describing the error.
As functions are allowed to open connections, it is acceptable to not have a return value (i.e. be None). For instance, your function might read from your database, perform some heavy computation, and then write back to your database.
cloud.map() mimics the built-in python map function. The basic built-in map function is:
added2 = map(lambda x: x+2, an_iterator)
Which is equivalent to:
added2 = [x+2 for x in an_iterator]
In other words, newlist = map(func,sequence) will return a list where newlist[i] = func(sequence[i]).
cloud.map() is designed for both ease-of-use and speed when applying the same function to a list of data. One job is created per element of an_iterator. Client to PiCloud overhead is minimized by using cloud.map() in lieu of a for loop generating multiple cloud.call().
The return value of cloud.map() is an ordered sequence of jids where jid[i] corresponds to func(sequence[i]). As mentioned earlier, this sequence can be passed to cloud.status(), cloud.join(), and cloud.result().
The below diagram shows what happens when cloud.map(square,[2,3]) is called:
One can also make more complex map calls, such as:
products = cloud.map(lambda x,y: x*y, xlist, ylist)
If xlist and ylist are different lengths, cloud.map() will increase the argument lists to the maximum length of the passed lists, inserting None when needed.
PiCloud offers four types of compute resources for you to utilize, each offering different amounts of CPU power and memory. They are:
|Type||Use Case||Compute Resources|
|c1||Simple tasks||1 Compute unit, 300 MB of memory, low I/O performance|
|c2||CPU bound tasks||2.5 Compute units, 800 MB of memory, medium I/O performance|
|m1||Large datasets||3.25 Compute units, 8 GB of memory, high I/O performance|
|s1||Scraping||variable power (max 2 c.u.), 300 MB memory, low I/O performance, unique ip address/core|
By default, PiCloud will use assign jobs to type c1, which uses one Amazon EC2 compute unit, the CPU capacity of a 1.0 to 1.2 GHz 2007 Xeon processor. If your job is CPU bound (that is does not spend most of its time waiting on I/O or sleeping), you can speed it up by using c2. Set the _type keyword argument to 'c2' within the arguments of cloud.call() or cloud.map(), which will place it on a CPU that is 2.5x faster. Note that while higher computation rates apply, you may pay about the same per CPU-bound job as the job will finish significantly faster.
PiCloud also offers a high memory resource type, m1. Jobs running on this type will be able to utilize 8 GB of RAM, on a very fast (3.25x c1) processor.
cloud.call(foo,_type='c2') #foo will be assigned c2 type and get 2.5x standard compute power cloud.map(lambda x,y: x*y, xlist, ylist,_type='m1') #each job produced by this map will run on an m1, with 8GB of RAM and 3.25x standard power cloud.call(foo,_type='c1') #This is the same as not specifying _type. foo receives 1 compute unit of CPU power
When PiCloud has slack capacity, c1 jobs may receive extra compute power, sometimes as much as 2.5 compute units. c2 jobs are guaranteed to always receive 2.5 compute units of power.
To learn how to manage the number of cores your computation uses, please see our blog post.
The before described programming framework is powerful, but sometimes it is necessary to read and write large amounts of data. For instance, you may have data files that will be used by all future jobs; it is wasteful to send such files at every cloud.call().
Instead, you can use cloud.files, a module that provides a special file storage and retrieval interface that can be used both on the client and PiCloud.
Internally, these files are stored within Amazon S3 buckets managed by PiCloud. This system is a key-value store that maps a file name to the file’s data. The keys are not paths; while you may put the ‘/’ character in a filename, directories do not per se exist.
The cloud.files interface is quite simple:
#This code below can run both locally or in a job running on Picloud cloud.files.put('names.txt') #put names on the Cloud cloud.files.get('names.txt','names2.txt') #retrieve names.txt from the Cloud and store it as names2.txt cloud.files.delete('names.txt') #remove file
For large files, you may wish to upload or download only when the version on PiCloud differs from the one on your local machine:
Several other functions exist to manage the stored files. Be sure to read the detailed documentation about the cloud.files interface.
Files can be up to 5 GB in size. Note that you are charged a monthly fee for storage.
cloud.files is distinct from the local file system jobs run in. Files must be explicitly placed onto the PiCloud S3 store with cloud.files.put() and retrieved with cloud.files.get().