Pitfalls

PiCloud has been designed to support most code out of the box. In general, you’ll find that most functions you call with cloud.call() will work as expected. If anything unexpected occurs, please read this section.

Behaviors

Pass by Value

As described in the Technical Overview, arguments, globals, etc., will be copied to the PiCloud server. If such objects are modified on the server, the changes will not be reflected on the client. An example helps explain the implications:

def foo(list_param):
        list_param[0] += 1
        return list_param
a_list = [1,2]
jid = cloud.call(foo,a_ist)
#a_list[0] is still 1
result = cloud.result(foo)
#a_list[0] is still 1

The only information that you can get back from PiCloud is passed via:

  1. The return value
  2. Any arbitrary connections your function opens. For instance you can write to your database from PiCloud.

Warning

Due to this behavior, be careful if you opt to use Django model objects. Calls to save() will only work if you have a database that is socket based. In particular, a sqlite database cannot be passed to PiCloud.

Networking

PiCloud will typically be on a different network than your client. Consequently, if you wish for PiCloud to access data stored on, say, a local database, you will need to set your access controls to allow for external access.

Also, while jobs can connect to other services, PiCloud’s firewall blocks jobs from listening for incoming connections.

Furthermore, to deal with the common problem users have with connections to external sites not being responded to, PiCloud has set a default network timeout to be 30 seconds (Python’s is to have no timeout). You will need to set the timeout parameter found in urllib, httplib, etc. to None if you want to remove this behavior.

Persistent Interpreter

As noted in the overview, jobs may be run in the same interpreter as previous jobs. Consequently, your functions should be written in a way that allows for successive invoking to occur. In particular:

  • Writing to globals should be avoided. If you must, values should be reset to the initial value before the function returns.
  • Any threads spawned must be terminated before the function returns. Note that some libraries you use (e.g. Twisted) may use threads internally; these also need to be shut down. You can use threading.enumerate() to see if you have any unexpected threads running. If threads are not terminated, PiCloud will list them in your standard error - and the interpreter will be killed.
  • Any subprocesses spawned must be terminated

Alternatively, you can disable this optimization with the _kill_process keyword.

Package Behavior

PiCloud has configured several commonly-used packages to behave in slightly different ways:

Limitations

Importing From Cloud module

Do not import functions from the cloud module. Doing something like this:

import cloud
from cloud import call, result
cloud.start_simulator()
jid = call(lambda x: x*x, 12)
print result(jid)

will fail as the cloud.start_simulator() redefines functions within the cloud module. All functions must be called through the cloud namespace.

Cannot reload modules

The following does not work:

import cloud
import mymodule
jid = cloud.call(mymodule.myfunc)
#make modifications to mymodule.myfunc
reload(mymodule)
jid2 = cloud.call(mymodule.myfunc)  #job will use old version of mymodule.myfunc

Once you invoke cloud.call(), cloud will snapshot imported modules and send changed ones to PiCloud. While the interpreter remains open, the modules will never be sent again, even if they are reloaded. If you change your code, you will need to open a new interpreter.

Non-code files

PiCloud will automatically send all code necessary to run desired functions. However, this copying does not extend to data files that are read and written from.

Can only pass read-only “physical” file objects

PiCloud has limited file object support. For instance, the following does work:

af = open('data.txt','r')
cloud.call(foo,af)

However, there are several limitations with file handles:

  • Files cannot be open for writing

  • File handles must refer to an object that can be read from the beginning to the end. Unsupported objects include:

    • Network streams
    • Standard input
  • File handles must refer to real files. Virtual files, such as those in /proc on Linux, cannot be passed to PiCloud.

  • File methods cannot be directly passed as the function to be executed. That is, cloud.call(af.read) would fail.

Warning

There is no caching system in place when files are sent as shown in the previous example. See the Dealing with Data section for further information.

Pickling Errors

In rare cases, PicklingError may occur. Such an error is triggered when cloud‘s enhanced pickler cannot serialize (pickle) data.

Failures can occur in two places:

  • call or map: A function or argument cannot be pickled.
  • result: Return value of function that ran on PiCloud cannot be pickled.

Whenever you see an error it is best to try looking at the serialization logs to find the problematic variable. You can often solve a pickling issue by marking the variable transient.

Two types of errors typically occur:

Too much data

By default, cloud will prevent transmission sizes for functions/arguments larger than 1 mb (can be modified by changing the``max_transmit_size`` setting). This is because it is easy to inadvertently pass large amounts of data. As an example, you might be passing a function that uses a global variable which refers to some management object. The management object be caching large amounts of loaded images; even if the function does not use this cache, it will still be sent. Cloud’s size restriction will warn you if you try transmitting such large amounts of data.

Nonsupported Objects

Certain datatypes cannot be serialized by the pickler if passed as arguments, globals, or instance members:

  • Generators

  • Generator method wrappers

  • A variety of itertools objects

  • Regular expression match objects (SRE_Match) returned by re.search.

  • Slice objects (e.g. slice(1,2)) under Python 2.5

  • Python 2.x Buffer objects

  • Python extension objects lacking pickling routines. Problematic objects include:

    • libsvm models

All of these datatypes can of course be constructed within a function running on PiCloud; they just cannot be passed as an argument into cloud.call().

Note

For commonly-used objects that cannot ordinarily be pickled, we have added serialization support. Such specially supported objects include functions, methods, partials (functools), xrange (range in Python3), read-only files, and PIL Images.

Import Errors

Cloud in general will transmit all code needed. However, ImportError exceptions can occur on PiCloud under adverse situations; the error may even be serious enough that you will receive a “Could not depickle job” error, blocking your job from ever starting. Most often, such errors are caused by the use of __import__ statements or custom python extensions, as described below.

__import__ statement

The cloud module relies on analyzing imports that modules make to determine what code must be sent to PiCloud. As PiCloud cannot know statically what module an __import__ statement will request, it might fail to send important code to PiCloud. It is strongly recommended that you avoid __import__ as cloud’s static dependency analysis may miss important files, resulting in PiCloud throwing ImportError exceptions.

Python Extensions

While PiCloud has many python extensions installed by default, you may find yourself using a custom C/C++ extension. Unfortunately, the cloud module will not automatically upload these. The only way to utilize such extensions is to upload the extension’s source to PiCloud with the web interface.

Other Pitfalls

Cloud Call Recursion

While you can make additional cloud calls from a function already running on PiCloud, blocking on such calls can, in some situations, trigger dead-lock. Typically the number of simultaneous jobs you can run is limited (by both elastic limits and your real-time compute units. See Scheduling)

Consider running a recursive Fibonacci sequence through the cloud:

def fib(n):
        if n <= 1:
                return 1
        jid1 = cloud.call(fib,n-1)
        jid2 = cloud.call(fib,n-2)
        return cloud.join(jid1) + cloud.join(jid2)

With values of n such as 5, large amounts of jobs will be created. If the current limit is 16 jobs (the actual limit will always be at or above this), PiCloud will not run the 17th job until one of the first 16 complete. If all 16 jobs though are waiting on either each other or the response of job 17, no further work can be done.

Whenever possible, use Dependencies instead of blocking recursion.

Name conflicts

Conflicts with PiCloud

PiCloud has many Python packages installed by default. If your modules share such package names, they are not guaranteed to work.

You also may find that you are running a different version of a package installed on PiCloud. In most cases, a different version will be compatible.

If you are attempting to use (or develop) a newer version of one of our installed packages for test, please contact us.

Conflicts with yourself.

All modules transported to PiCloud reside in the same directory on PiCloud’s servers. This can cause occasional name conflicts. Consider the following directory structure:

~/project1/foo.py
~/project2/foo.py

If you are in the project2 directory and run python foo.py, this will conflict with the foo.py from project1, preventing you from running both simultaneously. To avoid this issue, make sure that all of your python filenames are unique. Note that python files within different packages may share names; a simple rule is that import modname should always refer to a unique module on a given computer.

Calling exit

Your jobs should never call sys.exit(). They should only “exit” by using the return statement.

Other Possible Quirks

  • Any changes made to a module or class object at runtime will not be transmitted. In other words, if you assign os.mkdir to some other function, the original os.mkdir will be used on PiCloud. Equivalently, if A is a class and you set A.attribute to something else, that change with not be reflected on PiCloud.
  • __main__ on PiCloud is differs from __main__ locally. Some packages (such as pyEvolve 0.6rc1) may encounter problems due to assuming that certain functions exist in __main__.
  • Don’t make assumptions about import order on PiCloud. If the import order is logically possible, it might occur.
  • The cloud package is multithreaded internally, so forking a process while using it may be dangerous. See this explaination.