Client Basics

It should be quite easy to start using PiCloud. If any of the below does not work as you expect, please see the cloud pitfalls section.

Some quick terms:

  • PiCloud refers to the cloud platform solution operated by PiCloud, Inc..
  • Client refers to any computer communicating with PiCloud. This computer is typically an individual’s personal computer, but it may be a server in another sense (e.g. a webserver that interacts with PiCloud).
  • cloud is a python module that allows the client to run arbitrary code on PiCloud's servers.
_images/basic_pi.PNG

System Requirements

Pythons 2.6 or 2.7. with a correctly installed cloud module.

Note

Only the standard implementation of Python, CPython, is officially supported by PiCloud.

Running a Job

cloud.call() allows the client to run functions on PiCloud. For instance, a function named add can be run on the PiCloud servers merely by invoking cloud.call(add). Arguments, e.g. add(1,2) can be passed by listing them after the function, e.g. cloud.call(add,1,2) or by naming them, e.g express add(x=1, y=2) as cloud.call(add, x=1, y=2). After add completes, its return value is saved and can be accessed via cloud.result().

Upon invoking cloud.call(), the PiCloud server will create a job which runs the desired function. cloud.call() is non-blocking; its return value is an integer jid (Job IDentification). This jid can be used to access information about the job through the cloud module, as well as the PiCloud web interface. The below diagram represents this example:

_images/basic_pi_call.PNG

Because cloud.call() is non-blocking, all jobs run on PiCloud’s servers in parallel with each other and the client. As jobs run, the client can continue to create more jobs with cloud.call(). Consequently, cloud.call() allows for easily realized course-grain parallelism.

Functions that run on the PiCloud servers can do potentially anything. They can even open up arbitrary connections to download documents, access databases, post data to websites, etc.

All relevant information needed to execute your function (code, global variables, class variables, etc.) is transmitted to the PiCloud server. Most users will find that any function they’ve written can be passed through cloud.call(). See the Limitations section for more information.

Consider using cloud.map if you are generating many jobs in parallel with the same function, but different arguments.

Warning

There is some overhead on cloud calls. For PiCloud to be worthwhile, your function should take at least a tenth of a second to execute. See the PiCloud examples section for proper design patterns.

Note

cloud.call() may be invoked within a function running on PiCloud, allowing jobs to generate new jobs.

Accessing Job Status

To access the status of the created job, you will need the jid returned by cloud.call(). Job meta-data is purged very infrequently by PiCloud, so the jid can be placed safely into a database and checked days later.

Use cloud.status(jid), where jid was returned by an earlier invoked cloud.call() to get a job’s status. Like every other job-related function, cloud.status() also accepts a sequence of jids; if given a sequence, a list of statuses is returned corresponding to the requested jids.

The possible statuses are:

Status Meaning
queued Job is in a queue on the server waiting to be run.
processing Job is running.
waiting Job is waiting until its dependencies are satisfied.
done Job completed successfully.
error Job errored (typically due to an uncaught exception).
killed Job was aborted by the user.
stalled Job will not run due to a dependency erroring.

Blocking until Job finishes

Use cloud.join(jid) to block until the job with jid jid has a “complete” status (done, error, killed, or stalled). If the job errors, a cloud.CloudException will be thrown with a traceback indicating what went wrong on the server.

cloud.join() will also accept a sequence of jids and will block until all corresponding jobs complete. If a job is detected to have errored, an exception may be thrown before all requested jobs complete. If multiple jobs error, it is undefined which errored job the exception will describe. If the ignore_errors argument is set to True, no exception is thrown when jobs error, and join will block until every job has a “complete” status.

An optional timeout may be set. If the join takes longer than timeout seconds, it will abort by throwing a cloud.CloudTimeoutError.

Accessing Return Value

Use cloud.result() function to access the return value of the function that ran on PiCloud. This function blocks (with cloud.join()) until the job has completed. Like cloud.join(), any job erroring will result in an exception being thrown. The below diagram represents the earlier described case of add:

_images/basic_pi_result.PNG

cloud.result() also accepts a sequence of jids; if given a sequence, a list of return values is returned corresponding to the requested jids.

As with join, a timeout may be set.

If ignore_errors is set, no exceptions are thrown when encountering errored jobs. The result of an errored job is the cloud.CloudException describing the error.

Note

As functions are allowed to open connections, it is acceptable to not have a return value (i.e. be None). For instance, your function might read from your database, perform some heavy computation, and then write back to your database.

Mapping

cloud.map() mimics the built-in python map function. The basic built-in map function is:

added2 = map(lambda x: x+2, an_iterator)

Which is equivalent to:

added2 = [x+2 for x in an_iterator]

In other words, newlist = map(func,sequence) will return a list where newlist[i] = func(sequence[i]).

cloud.map() is designed for both ease-of-use and speed when applying the same function to a list of data. One job is created per element of an_iterator. Client to PiCloud overhead is minimized by using cloud.map() in lieu of a for loop generating multiple cloud.call().

The return value of cloud.map() is an ordered sequence of jids where jid[i] corresponds to func(sequence[i]). As mentioned earlier, this sequence can be passed to cloud.status(), cloud.join(), and cloud.result().

The below diagram shows what happens when cloud.map(square,[2,3]) is called:

_images/basic_pi_map.PNG

One can also make more complex map calls, such as:

products = cloud.map(lambda x,y: x*y, xlist, ylist)

cloud.iresult may come in handy if you wish to iterate through the results of cloud.map().

Note

If xlist and ylist are different lengths, cloud.map() will increase the argument lists to the maximum length of the passed lists, inserting None when needed.

More Speed, More Memory

PiCloud offers four types of compute resources for you to utilize, each offering different amounts of CPU power and memory. They are:

Type Use Case Compute Resources
c1 Simple tasks 1 Compute unit, 300 MB of memory, low I/O performance
c2 CPU bound tasks 2.5 Compute units, 800 MB of memory, medium I/O performance
m1 Large datasets 3.25 Compute units, 8 GB of memory, high I/O performance
s1 Scraping variable power (max 2 c.u.), 300 MB memory, low I/O performance, unique ip address/core

By default, PiCloud will use assign jobs to type c1, which uses one Amazon EC2 compute unit, the CPU capacity of a 1.0 to 1.2 GHz 2007 Xeon processor. If your job is CPU bound (that is does not spend most of its time waiting on I/O or sleeping), you can speed it up by using c2. Set the _type keyword argument to 'c2' within the arguments of cloud.call() or cloud.map(), which will place it on a CPU that is 2.5x faster. Note that while higher computation rates apply, you may pay about the same per CPU-bound job as the job will finish significantly faster.

PiCloud also offers a high memory resource type, m1. Jobs running on this type will be able to utilize 8 GB of RAM, on a very fast (3.25x c1) processor.

Example:

cloud.call(foo,_type='c2')  #foo will be assigned c2 type and get 2.5x standard compute power
cloud.map(lambda x,y: x*y, xlist, ylist,_type='m1')  #each job produced by this map will run on an m1, with 8GB of RAM and 3.25x standard power

cloud.call(foo,_type='c1')  #This is the same as not specifying _type. foo receives 1 compute unit of CPU power

Note

When PiCloud has slack capacity, c1 jobs may receive extra compute power, sometimes as much as 2.5 compute units. c2 jobs are guaranteed to always receive 2.5 compute units of power.

To learn how to manage the number of cores your computation uses, please see our blog post.

Cloud Files

The before described programming framework is powerful, but sometimes it is necessary to read and write large amounts of data. For instance, you may have data files that will be used by all future jobs; it is wasteful to send such files at every cloud.call().

Instead, you can use cloud.files, a module that provides a special file storage and retrieval interface that can be used both on the client and PiCloud.

Internally, these files are stored within Amazon S3 buckets managed by PiCloud. This system is a key-value store that maps a file name to the file’s data. The keys are not paths; while you may put the ‘/’ character in a filename, directories do not per se exist.

The cloud.files interface is quite simple:

  • cloud.files.put(): Store a file on PiCloud’s S3 store.
  • cloud.files.get(): Retrieve a file stored on PiCloud’s S3 store.
  • cloud.files.delete(): Delete a file stored on PiCloud’s S3 store.

Example:

#This code below can run both locally or in a job running on Picloud
cloud.files.put('names.txt')  #put names on the Cloud
cloud.files.get('names.txt','names2.txt')  #retrieve names.txt from the Cloud and store it as names2.txt
cloud.files.delete('names.txt')  #remove file

For large files, you may wish to upload or download only when the version on PiCloud differs from the one on your local machine:

  • cloud.files.sync_to_cloud() : Upload file if it doesn’t exist on PiCloud or differs from the version stored on PiCloud.
  • cloud.files.sync_from_cloud() : Download file if it doesn’t exist locally or local version differs from file stored on PiCloud.

Several other functions exist to manage the stored files. Be sure to read the detailed documentation about the cloud.files interface.

Files can be up to 5 GB in size. Note that you are charged a monthly fee for storage.

Note

cloud.files is distinct from the local file system jobs run in. Files must be explicitly placed onto the PiCloud S3 store with cloud.files.put() and retrieved with cloud.files.get().

Simulation

PiCloud offers a simulator to run your PiCloud code locally. You may find that debugging is easier in simulation. To enable the simulator, run:

cloud.start_simulator()

Or edit cloudconf.py, and set use_simulator to True.

For more information on the simulator, see the advanced section.