PiCloud has been designed to support most code out of the box. In general, you’ll find that most functions you call with cloud.call() will work as expected. If anything unexpected occurs, please read this section.
As described in the Technical Overview, arguments, globals, etc., will be copied to the PiCloud server. If such objects are modified on the server, the changes will not be reflected on the client. An example helps explain the implications:
def foo(list_param): list_param += 1 return list_param a_list = [1,2] jid = cloud.call(foo,a_ist) #a_list is still 1 result = cloud.result(foo) #a_list is still 1
The only information that you can get back from PiCloud is passed via:
Due to this behavior, be careful if you opt to use Django model objects. Calls to save() will only work if you have a database that is socket based. In particular, a sqlite database cannot be passed to PiCloud.
PiCloud will typically be on a different network than your client. Consequently, if you wish for PiCloud to access data stored on, say, a local database, you will need to set your access controls to allow for external access.
Also, while jobs can connect to other services, PiCloud’s firewall blocks jobs from listening for incoming connections.
Furthermore, to deal with the common problem users have with connections to external sites not being responded to, PiCloud has set a default network timeout to be 30 seconds (Python’s is to have no timeout). You will need to set the timeout parameter found in urllib, httplib, etc. to None if you want to remove this behavior.
As noted in the overview, jobs may be run in the same interpreter as previous jobs. Consequently, your functions should be written in a way that allows for successive invoking to occur. In particular:
Alternatively, you can disable this optimization with the _kill_process keyword.
Do not import functions from the cloud module. Doing something like this:
import cloud from cloud import call, result cloud.start_simulator() jid = call(lambda x: x*x, 12) print result(jid)
The following does not work:
import cloud import mymodule jid = cloud.call(mymodule.myfunc) #make modifications to mymodule.myfunc reload(mymodule) jid2 = cloud.call(mymodule.myfunc) #job will use old version of mymodule.myfunc
Once you invoke cloud.call(), cloud will snapshot imported modules and send changed ones to PiCloud. While the interpreter remains open, the modules will never be sent again, even if they are reloaded. If you change your code, you will need to open a new interpreter.
PiCloud will automatically send all code necessary to run desired functions. However, this copying does not extend to data files that are read and written from.
PiCloud has limited file object support. For instance, the following does work:
af = open('data.txt','r') cloud.call(foo,af)
However, there are several limitations with file handles:
Files cannot be open for writing
File handles must refer to an object that can be read from the beginning to the end. Unsupported objects include:
- Network streams
- Standard input
File handles must refer to real files. Virtual files, such as those in /proc on Linux, cannot be passed to PiCloud.
File methods cannot be directly passed as the function to be executed. That is, cloud.call(af.read) would fail.
There is no caching system in place when files are sent as shown in the previous example. See the Dealing with Data section for further information.
In rare cases, PicklingError may occur. Such an error is triggered when cloud‘s enhanced pickler cannot serialize (pickle) data.
Failures can occur in two places:
Two types of errors typically occur:
By default, cloud will prevent transmission sizes for functions/arguments larger than 1 mb (can be modified by changing the``max_transmit_size`` setting). This is because it is easy to inadvertently pass large amounts of data. As an example, you might be passing a function that uses a global variable which refers to some management object. The management object be caching large amounts of loaded images; even if the function does not use this cache, it will still be sent. Cloud’s size restriction will warn you if you try transmitting such large amounts of data.
Certain datatypes cannot be serialized by the pickler if passed as arguments, globals, or instance members:
Generator method wrappers
A variety of itertools objects
Regular expression match objects (SRE_Match) returned by re.search.
Slice objects (e.g. slice(1,2)) under Python 2.5
Python 2.x Buffer objects
Python extension objects lacking pickling routines. Problematic objects include:
- libsvm models
All of these datatypes can of course be constructed within a function running on PiCloud; they just cannot be passed as an argument into cloud.call().
For commonly-used objects that cannot ordinarily be pickled, we have added serialization support. Such specially supported objects include functions, methods, partials (functools), xrange (range in Python3), read-only files, and PIL Images.
Cloud in general will transmit all code needed. However, ImportError exceptions can occur on PiCloud under adverse situations; the error may even be serious enough that you will receive a “Could not depickle job” error, blocking your job from ever starting. Most often, such errors are caused by the use of __import__ statements or custom python extensions, as described below.
The cloud module relies on analyzing imports that modules make to determine what code must be sent to PiCloud. As PiCloud cannot know statically what module an __import__ statement will request, it might fail to send important code to PiCloud. It is strongly recommended that you avoid __import__ as cloud’s static dependency analysis may miss important files, resulting in PiCloud throwing ImportError exceptions.
While PiCloud has many python extensions installed by default, you may find yourself using a custom C/C++ extension. Unfortunately, the cloud module will not automatically upload these. The only way to utilize such extensions is to upload the extension’s source to PiCloud with the web interface.
While you can make additional cloud calls from a function already running on PiCloud, blocking on such calls can, in some situations, trigger dead-lock. Typically the number of simultaneous jobs you can run is limited (by both elastic limits and your real-time compute units. See Scheduling)
Consider running a recursive Fibonacci sequence through the cloud:
def fib(n): if n <= 1: return 1 jid1 = cloud.call(fib,n-1) jid2 = cloud.call(fib,n-2) return cloud.join(jid1) + cloud.join(jid2)
With values of n such as 5, large amounts of jobs will be created. If the current limit is 16 jobs (the actual limit will always be at or above this), PiCloud will not run the 17th job until one of the first 16 complete. If all 16 jobs though are waiting on either each other or the response of job 17, no further work can be done.
Whenever possible, use Dependencies instead of blocking recursion.
PiCloud has many Python packages installed by default. If your modules share such package names, they are not guaranteed to work.
You also may find that you are running a different version of a package installed on PiCloud. In most cases, a different version will be compatible.
If you are attempting to use (or develop) a newer version of one of our installed packages for test, please contact us.
All modules transported to PiCloud reside in the same directory on PiCloud’s servers. This can cause occasional name conflicts. Consider the following directory structure:
If you are in the project2 directory and run python foo.py, this will conflict with the foo.py from project1, preventing you from running both simultaneously. To avoid this issue, make sure that all of your python filenames are unique. Note that python files within different packages may share names; a simple rule is that import modname should always refer to a unique module on a given computer.
Your jobs should never call sys.exit(). They should only “exit” by using the return statement.