It should be quite easy to start using PiCloud. If any of the below does not work as you expect, please see the cloud pitfalls section.
Some quick terms:
Python 2.5, Python 2.6 or Python 3.1 with a correctly installed cloud module.
Note
Jobs sent to PiCloud when using a Python 2.x client will be run under Python 2.6. Jobs sent using a Python 3.x client will be run under Python 3.1.
cloud.call() allows the client to run functions on PiCloud. For instance, a function named add can be run on the PiCloud servers merely by invoking cloud.call(add). Arguments, e.g. add(1,2) can be passed by listing them after the function, e.g. cloud.call(add,1,2) or by naming them, e.g express add(x=1, y=2) as cloud.call(add, x=1, y=2). After add completes, its return value is saved and can be accessed via cloud.result().
Upon invoking cloud.call(), the PiCloud server will create a job which runs the desired function. cloud.call() is non-blocking; its return value is an integer jid (Job IDentification). This jid can be used to access information about the job through the cloud module, as well as the PiCloud web interface. The below diagram represents this example:
Because cloud.call() is non-blocking, all jobs run on PiCloud’s servers in parallel with each other and the client. As jobs run, the client can continue to create more jobs with cloud.call(). Consequently, cloud.call() allows for easily realized course-grain parallelism.
Functions that run on the PiCloud servers can do potentially anything. They can even open up arbitrary connections to download documents, access databases, post data to websites, etc.
All relevant information needed to execute your function (code, global variables, class variables, etc.) is transmitted to the PiCloud server. Most users will find that any function they’ve written can be passed through cloud.call(). See the Limitations section for more information.
Consider using cloud.map if you are generating many jobs in parallel with the same function, but different arguments.
Warning
There is some overhead on cloud calls. For PiCloud to be worthwhile, your function should take at least a tenth of a second to execute. See the PiCloud examples section for proper design patterns.
Note
cloud.call() may be invoked within a function running on PiCloud, allowing jobs to generate new jobs.
To access the status of the created job, you will need the jid returned by cloud.call(). Job meta-data is purged very infrequently by PiCloud, so the jid can be placed safely into a database and checked days later.
Use cloud.status(jid), where jid was returned by an earlier invoked cloud.call() to get a job’s status. Like every other job-related function, cloud.status() also accepts a sequence of jids; if given a sequence, a list of statuses is returned corresponding to the requested jids.
The possible statuses are:
| Status | Meaning |
|---|---|
| queued | Job is in a queue on the server waiting to be run. |
| processing | Job is running. |
| waiting | Job is waiting until its dependencies are satisfied. |
| done | Job completed successfully. |
| error | Job errored (typically due to an uncaught exception). |
| killed | Job was aborted by the user. |
| stalled | Job will not run due to a dependency erroring. |
Use cloud.join(jid) to block until the job with jid jid has a “complete” status (done, error, killed, or stalled). If the job errors, a cloud.CloudException will be thrown with a traceback indicating what went wrong on the server.
cloud.join() will also accept a sequence of jids and will block until all jobs complete. If multiple jobs throw an exception, the exception thrown will describe the first (in sequence order) job that errored.
An optional timeout may be set. If the join takes longer than timeout seconds, it will abort by throwing a cloud.CloudTimeoutError.
Use cloud.result() function to access the return value of the function that ran on PiCloud. This function blocks (with cloud.join()) until the job has completed. Like cloud.join(), any error will result in an exception being thrown. The below diagram represents the earlier described case of add:
cloud.result() also accepts a sequence of jids; if given a sequence, a list of return values is returned corresponding to the requested jids.
As with join, a timeout may be set.
Note
As functions are allowed to open connections, it is acceptable to not have a return value. For instance, your function might read from your database, perform some heavy computation, and then write back to your database.
cloud.map() mimics the built-in python map function. The basic built-in map function is:
added2 = map(lambda x: x+2, an_iterator)
Which is equivalent to:
added2 = [x+2 for x in an_iterator]
In other words, newlist = map(func,sequence) will return a list where newlist[i] = func(sequence[i]).
cloud.map() is designed for both ease-of-use and speed when applying the same function to a list of data. One job is created per element of an_iterator. Client to PiCloud overhead is minimized by using cloud.map() in lieu of a for loop generating multiple cloud.call().
The return value of cloud.map() is an ordered sequence of jids where jid[i] corresponds to func(sequence[i]). As mentioned earlier, this sequence can be passed to cloud.status(), cloud.join(), and cloud.result().
The below diagram shows what happens when cloud.map(square,[2,3]) is called:
One can also make more complex map calls, such as:
products = cloud.map(lambda x,y: x*y, xlist, ylist)
cloud.iresult may come in handy if you wish to iterate through the results of cloud.map().
Note
If xlist and ylist are different lengths, cloud.map() will increase the argument lists to the maximum length of the passed lists, inserting None when needed.
By default, PiCloud will assign jobs one Amazon EC2 compute unit, the CPU capacity of a 1.0 to 1.2 GHz 2007 Xeon processor. If your job is CPU bound (that is does not spend most of its time waiting on I/O or sleeping), you can speed it up by granting it more compute units. Set the _high_cpu keyword argument to True within the arguments of cloud.call() or cloud.map() to assign jobs 2.5 compute units. Note that while higher computation rates apply, you may pay about the same per CPU-bound job as the job will finish significantly faster.
Jobs marked _high_cpu will have a higher RAM limit of 1.7 GB, rather than the standard 850 MB.
Example:
cloud.call(foo,_high_cpu=True) #foo will be assigned 2.5 compute units of CPU power
cloud.map(lambda x,y: x*y, xlist, ylist,_high_cpu=True) #each job produced by this map will be assigned 2.5 compute units of CPU power
cloud.call(foo,_high_cpu=False) #This is the same as not specifying _high_cpu. foo receives 1 compute unit of CPU power
Note
When PiCloud has slack capacity, 1 compute unit jobs may receive extra compute power, sometimes as much as 2.5 compute units. _high_cpu jobs, however, are guaranteed to receive 2.5 compute units of power.
The before described programming framework is powerful, but sometimes it is necessary to read and write large amounts of data. For instance, you may have data files that will be used by all future jobs; it makes no sense to send such files at every cloud.call().
Instead, you should use cloud.files, a module that provides a file storage and retrieval interface that can be used both on the client and PiCloud.
Internally, these files are stored within Amazon S3 buckets managed by PiCloud. This system is effectively a key-value store that maps a file name to the file’s data. The keys are not paths; while you may put the ‘/’ character in a filename, directories do not per se exist.
The cloud.files interface is quite simple:
Example:
#This code below can run both locally or in a job running on Picloud
cloud.files.put('names.txt') #put names on the Cloud
cloud.files.get('names.txt','names2.txt') #retrieve names.txt from the Cloud and store it as names2.txt
cloud.files.delete('names.txt') #remove file
Several other functions exist to manage the stored files. Be sure to read the detailed documentation about the cloud.files interface.
Files can be up to 5 GB in size. Note that you are charged a monthly fee for storage.