Environment

Environments enable you to use any library or program you need for your computation.

Fundamentally, libraries and programs are just files that a job needs to be able to find on the filesystem. By creating an environment, you are able to modify the filesystem your job sees so that it contain the dependencies it needs.

Environments support programs and libraries compatible with Linux running on x86-64 architecture.

What is an Environment

Think of an environment as a customized filesystem. When your job runs, it is running on top of a Linux filesystem that it has access to. This can be seen by running a job that lists the contents of the root of the filesystem.

For example, in Python:

>>> import cloud
>>> import os
>>> jid = cloud.call(lambda: os.listdir('/')) # list contents of root dir
>>> cloud.result(jid)
['var', 'tmp', 'etc', 'usr', 'home', 'dev', 'bin', 'lib', 'lib64', 'mnt', 'run', 'proc', 'root', 'sbin', 'srv', 'sys']

In the shell,

$ picloud exec ls /
[jid]
$ picloud result [jid]
bin
boot
dev
etc
home
lib
lib64
media
mnt
proc
root
run
sbin
selinux
srv
sys
tmp
usr
var

Base Environment

The contents of the filesystem seen above are called our base environment. It’s the default, hence base, filesystem available to you. The environment feature gives you the power to modify a base environment to include the programs and libraries your jobs need.

We provide three base environments. By default, picloud uses the Ubuntu 11.04 Natty base environment. When using cloud, the base environment used by default depends on the version of Python you’re using: Maverick for 2.6 and Natty for 2.7.

Environment Name Distribution Python Version Contents
/base/maverick Ubuntu Maverick 10.10 2.6 Maverick Contents
/base/natty Ubuntu Natty 11.04 2.7 Natty Contents
/base/precise Ubuntu Precise 12.04 2.7 Precise Contents

Click on a base environment above to see what’s installed.

In Python, to manually specify the base environment, use the _env keyword:

>>> cloud.call(f, _env='base/precise')

In the shell, use the -e flag:

$ picloud exec -e base/precise program

When to use an Environment

If the base environment you’re using does not contain what you need, you will need to use an environment. Depending on whether you’re using our Python library cloud or our CLI picloud, knowing when you need an environment differs.

Python

If you’ve been using Python-only packages with our cloud library, you’ve probably become accustomed to our Automagic Dependency Transfer. Code such as the following works straight out of the box without you needing to deploy your_expansive_library_of_functions manually to PiCloud.

>>> import cloud
>>> from your_expansive_library_of_functions import complex_function
>>> # cloud.call transfers all the modules needed to run complex_function on PiCloud
>>> cloud.call(complex_function)

However, the cloud library can only transfer pure Python modules. If you need access to Python modules that are non-Python, such as C-extensions, then you’ll need to install it via an environment.

Here we’ll show an example using the Obspy package, which is a Python toolbox for processing seismological data. The examples assume that obspy is installed locally.

>>> def f():
...     import obspy
...
>>> f() # works because obspy is installed locally
>>> jid = cloud.call(f)
>>> cloud.result(jid)
[Mon Sep 19 13:32:29 2012] - [WARNING] - Cloud: Job 1337 threw exception:
 Could not depickle job
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/cloud/serialization/cloudpickle.py", line 679, in subimport
    __import__(name)
ImportError: No module named obspy

As you can see, f() worked locally, but failed to run on PiCloud because obspy is not available on the base environment. In Create a new Environment, we’ll see how to resolve this.

Shell

The determination for when to use an environment for a program invoked by the CLI is simpler. Since picloud does not copy executables from your machine to PiCloud automatically, anything that is not on the base environment that you need will have to be deployed with an environment.

In this example, we try to use the convert program that comes with ImageMagick.

$ picloud exec convert
[jid]

$ picloud result [jid]
Job [jid]: Traceback (most recent call last):
CloudException: command terminated with nonzero return code 127

$ picloud info [jid]
Info for jid [jid]
status: error
stderr:
/bin/sh: convert: not found

The call to convert fails. When we examine the stderr, we can see that it was because convert was not found. Looking at the contents of the Ubuntu Natty 11.04 base environment, we shouldn’t be surprised since imagemagick is not included in the list.

Create a new Environment

The simplest way to manage environments is through the Environments Dashboard.

  1. Go to the Environments tab.

    _images/dashboard.png
  2. Click the “create new environment” button.

    _images/create.png

    Choose the base environment most useful for you, keeping in mind that if you’re using cloud, you will want to pick a base environment with a compatible Python version. This is described in the Base Environment introductory section.

    The Environment Name is the name you’ll use to reference the environment in your jobs. The Environment Description is for yourself and/or your team to keep track of the purpose and contents of each environment.

    For our example, let’s name the environment sample_env.

  3. Click submit.

    When you click submit, your environment will appear under the “Environments being configured” section. You may have to wait a minute or two while we boot and configure a setup server for you. The setup server is a temporary machine that represents your environment. Changes you make to the setup server will be reflected in your environment when you save it.

    _images/setup.png

Connecting

When the setup server is ready, click the “connect” connect_icon icon if you are using a web browser that supports websockets, and an SSH session will be started for you.

_images/connect1.png

Otherwise, you can use an SSH client of your choice and follow the instructions provided by clicking on the “key” key_icon icon. Note that the instructions are tailored towards *nix systems. If you are using Windows and do not have an SSH client, we recommend Tunnelier.

Getting Around

Once you’ve SSH-ed in, you’ll find yourself in an Ubuntu Linux filesystem environment.

picloud@ip-10-46-223-4:~$ ls /
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  sbin selinux  srv  sys  tmp  usr  var

Your current working directory is /home/picloud:

picloud@ip-10-46-223-4:~$ pwd
/home/picloud

You can verify the distribution of Ubuntu you’re using:

picloud@ip-10-46-223-4:~$ cat /etc/issue
Ubuntu 11.04 \n \l

We give you sudo access so that you have the freedom to install anything anywhere.

# this does not produce an error
picloud@ip-10-46-223-4:~$ sudo touch /root/i_can_be_root

Note

The owner and group for files and directories in your environment do not matter. While you’ll be using the picloud and root user accounts, your jobs will be run with a different user account that will have access to the entire filesystem environment.

Installing Custom Package

For the Python example, we’ll use sudo access to install the ObsPy library using pip.

picloud@ip-10-46-223-4:~$ sudo pip install obspy.core obspy.signal
Downloading/unpacking obspy.core
  Downloading obspy.core-0.4.8.zip (186Kb): 186Kb downloaded
  Running setup.py egg_info for package obspy.core
  ...    # output shortened for brevity
Successfully installed obspy.core obspy.signal
Cleaning up...

For the shell example, we’ll use sudo access to install imagemagick using apt-get.

picloud@ip-10-46-223-4:~$ sudo apt-get install imagemagick
Reading package lists... Done
Building dependency tree
Reading state information... Done
...
Setting up netpbm (2:10.0-15) ...
Setting up gs-cjk-resource (1.20100103-3) ...
Setting up libgs9 (9.05~dfsg-0ubuntu4.2) ...
Setting up ghostscript (9.05~dfsg-0ubuntu4.2) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place

Saving

When you click the “save” save_icon icon, your SSH connection to the setup server will be closed. The length of time it takes to save your environment depends on how much you’ve installed. Once saving has completed, your jobs can start using the environment, and you can also SSH back into the setup server to make additional modifications. If you are finished making changes to your environment, or wish to discard the changes you’ve made since the last “save” request, simply click the “shutdown” shutdown_icon icon. You can also perform both the save and shutdown operations by clicking the “save & shutdown” save_shutdown_icon icon.

Note

Please shutdown the setup server if you aren’t using it. It costs us money to keep it up for you, and we automatically terminate it after 8 hours.

Modifying

You may need to modify an existing environment in order to fix mistakes, install additional dependencies, or update packages you’ve already installed. Locate your environment in the “Your environments” section of the Environments Dashboard, and click on the “modify” modify_icon icon. A setup server for the environment will be prepared for you.

Command Line Interface

If you feel more comfortable using the command line, or if you wish to automate the management of environments through scripts, the Environment Dashboard functionality is also available through picloud.

Creating Environments

To create an environment called sample_env using the Ubuntu Precise 12.04 base:

$ picloud env list-bases
name         distro                    python_version
maverick     Ubuntu Maverick 10.10     2.6
natty        Ubuntu Natty 11.04        2.7
precise      Ubuntu Precise 12.04      2.7

$ picloud env create sample_env precise -d 'Initial environment for testing'
ec2-50-16-29-225.compute-1.amazonaws.com

Provide the create command with the environment name and the name of the base environment you wish to use. Pass the -d flag to set a description for the environment. The create command returns the hostname of the setup server where the sample_env environment can be modified.

Connecting

You can connect to the setup server by invoking:

$ picloud env ssh sample_env
Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-38-virtual x86_64)

Welcome to your Environment Setup Server!
picloud@ip-10-12-27-237:~$ sudo apt-get install imagemagick
...

You can also issue shell commands to be run on the setup server.

$ picloud env ssh sample_env pwd
/home/picloud
$ picloud env ssh sample_env cat /etc/issue
Ubuntu 12.04.1 LTS \n \l

Transferring Files

If you are using environments to compile and install your own programs, you will need to transfer files between your local machine and the setup server. You can use env rsync for this purpose.

$ ls my_dir
file1  file2

$ picloud env rsync my_dir sample_env:/home/picloud/
sending incremental file list
my_dir/
my_dir/file1
my_dir/file2

sent 152 bytes  received 54 bytes  82.40 bytes/sec
total size is 0  speedup is 0.00

$ picloud env ssh sample_env ls -R
.:
my_dir

./my_dir:
file1
file2

Note that the syntax for env rsync is modeled after that of the real rsync program, except the environment name is used in place of username@hostname. You can also pull files from the setup server to your local machine by specifying the environment path (e.g. “sample_env:/home/picloud/my_dir”) as the source.

Running Scripts

As you automate the process of creating or updating your custom environments, it is likely you will encapsulate the necessary sequence of commands into scripts. The env module offers a convenience wrapper for this purpose that copies a local script file to the setup server, executes it, and displays the output.

$ cat <<END > my_script
> #!/bin/bash
> sudo pip install obspy.core obspy.signal
> sudo apt-get install -y imagemagick   # -y ensures it'll run without interaction
> END

$ picloud env run-script sample_env my_script
Downloading/unpacking obspy.core    # start of output from pip install
  Downloading obspy.core-0.4.8.zip (186Kb): 186Kb downloaded
  Running setup.py egg_info for package obspy.core
  ...
Successfully installed obspy.core obspy.signal
Cleaning up...
Reading package lists... Done   # start of output from apt-get install
Building dependency tree
Reading state information... Done
...
Setting up netpbm (2:10.0-15) ...
Setting up gs-cjk-resource (1.20100103-3) ...
Setting up libgs9 (9.05~dfsg-0ubuntu4.2) ...
Setting up ghostscript (9.05~dfsg-0ubuntu4.2) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place

Saving

You can save your environment from both the setup server and your local machine:

# This is on the setup server. It will also log you out of the setup server.
# Use "sudo save-shutdown" if setup server should be terminated after saving.
picloud@ip-10-12-27-237:~$ sudo save
Environment save has been initiated...

Connection to ec2-50-16-29-225.compute-1.amazonaws.com closed by remote host.
# This is on your local machine.
$ picloud env save sample_env
$ picloud env shutdown sample_env
# both operations can be done with one command:
# picloud env save-shutdown sample_env

Modifying

To modify an existing environment, use env modify:

$ picloud env list
name           status     action     created                 last_modified
sample_env     ready      idle       2013-04-18 07:17:39     2013-04-19 03:36:24

$ picloud env modify sample_env
ec2-54-234-109-217.compute-1.amazonaws.com

The env modify command returns once a setup server has been prepared for your environment modification.

Using with a Job

Using an environment with a job is simple. In Python, you use the _env keyword:

>>> def f():
...     import obspy
...     return obspy.__path__
...
>>> jid = cloud.call(f, _env='sample_env')
>>> cloud.result(jid)
['/usr/local/lib/python2.7/dist-packages/obspy']

Now that we’ve specified to use our sample_env, the job runs without error.

In the shell, use the -e flag:

$ picloud exec -e sample_env convert -version
[jid]

$ picloud result [jid]
 Version: ImageMagick 6.6.9-7 2012-08-17 Q16 http://www.imagemagick.org
 Copyright: Copyright (C) 1999-2011 ImageMagick Studio LLC
 Features: OpenMP

Sharing

Environment sharing allows you to give other PiCloud users access to environments you’ve created. A user who has access to your environment can use it for their jobs or even clone their own copy for further modification.

Sharing Your Environments

By default, the visibility of an environment is set to “private” - only the creator can see or use the environment. To share your environment with others, locate its entry under the “Your environments” section of the Environments Dashboard, and click the “sharing” sharing_icon icon to bring up the sharing options panel.

_images/sharing_popup.png

There are two modes of environment sharing: sharing with only specific users or making it publicly available. If you select “shared” under the Visibility and Sharing option, you can give access to specific users by entering the email addresses associated with their PiCloud accounts. If you select “public”, you allow anyone to use or clone the environment. Be sure to write an accurate and helpful description if you are going make your environment public. A list of all public environments is viewable on our Public Environments Page.

Note

Keep in mind that when you share an environment, you are giving others permission to see everything in that environment. Please be careful not to include any sensitive or private data in your shared environments.

Environments Shared With You

There are three tabs on the Environments Dashboard. The first is “Your environments”. The second is “Shared With You” where we list environments which are owned by other users and are explicitly shared with your PiCloud account.

_images/shared_dashboard2.png

Each row in this table shows basic information about the environment that has been shared such as the owner name and a description of the environment. The “clone” clone_icon icon lets you clone the shared environment, making your own copy of the environment which you can modify. The cloned environment will show up under the “Your Environments” tab.

The “delete” delete_icon icon will remove the shared environment from the list, removing your access to it. This only deletes your share, not the original environment. If you still need access to the environment, the owner will need to explicitly share it with you again.

Public Environments

The third tab in the Environments Dashboard is the “Public Environment” tab.

_images/public_dashboard.png

The top section lists any public environments that you have favorited, this way you can quickly see information on your most used public environments. Favorited environments will also be listed on your Notebook Dashboard, allowing you to use them with an IPython Notebook.

The bottom section lists all public environments available on PiCloud, sorted by most favorited. The environments with the star favorite_icon are public environments that you have favorited. You can click on an entry to view more information about that environment.

You can also search on the names and descriptions of public environments by entering a query into the search field. Like other shared environments, you may also clone any public environment.

Using Shared Environments

Using a shared environment in a PiCloud job is as simple as using your own environments. When specifying the name of the environment, you simply use the full shared name of the environment in the form of /username/environment_name.

Using the cloud Python module:

>>> def f():
...     import pandas
...     return pandas.version.version
...
>>> jid = cloud.call(f, _env='/picloud/pandas')
>>> cloud.result(jid)
'0.10.1'

Using the picloud command:

$ picloud exec -e /picloud/imagemagick convert

How Environments Work

If you’re curious about how the Environment system is implemented, check out our blog post.