News Brief

24/10/2012 - Wolfpack Cluster's launched! Check out the User Guide now!

Development @ PWBC, a crash course

Setting up nameserver on LINUX

Using the local nameserver allows ssh logins, scps etc to be much faster.

In your local machine edit: /etc/resolv.conf

nameserver xx.xx.xx.189
search garvan.unsw.edu.au
domain garvan.unsw.edu.au

Note: For Ubuntu, where resolv.conf is rewritten on bootup, go to: http://www.liberiangeek.net/2012/05/setup-static-dns-servers-in-ubuntu-12-04-precise-pangolin/

Other solutions: http://askubuntu.com/questions/130452/how-do-i-add-a-dns-server-via-resolv-conf

IF you've hand edited /etc/resolv.conf you may be able to check with nslookup:

$ nslookup nerv-geofront
Server:		xx.xx.xx.189
Address:	xx.xx.xx.189#53

Name:	nerv-geofront.garvan.unsw.edu.au
Address: xx.xx.xx.11

Note about "auto" mount points

“auto” CIFS mounts, mount the first time (in a session) you access them. They will not be visible when you do “ls” and try <tab> autocomplete.

Therefore you will need to know the location and name of the mount.

Eg.

user@enzo$ cd /misc/FacilityBioinformatics

BACKUP your data

Especially on the Wolfpack cluster, where your home directory is not backed up, it is important to know your backup options.

There are a few options for code and data:

  • Backup your code to a version control system hosted on a public (eg Github, Bitbucket) or private repository (there are some around Garvan, ask your friends)
  • Else push code to a private “local” repository (eg. git init –bare) that resides in a backed up location eg.
    • a group mount like /misc/FacilityBioinformatics
    • a /share/…/contrib/ folder
  • You can also move data/code via scp (or similar)

Hosts

The following are the main hosts (some of which are virtual VMs) at PWBC.

General development machines

Host Linux distro Description Mount points
enzo Ubuntu 10.10 General development box ClusterHome, /misc/FacilityBioinformatics/
fiorano Ubuntu 10.10 GenePattern development /misc/{pandora username}, /mnt/FacilityBioinformatics/, /misc/FacilityBioinformatics

Wolfpack Cluster machines

Host Linux distro Description Mount points
gamma00 CentOS 6.2 Login node to Wolfpack cluster ClusterHome, /share/ClusterShare
gamma01 CentOS 6.2 Login node to Wolfpack cluster ClusterHome, /share/ClusterShare
gamma02 CentOS 6.2 Login node to Wolfpack cluster ClusterHome, /share/ClusterShare

Cluster basics

Nodes

A node of a cluster corresponds to a physical machine. Several of these make up the cluster.

Each node has assigned role(s): wolfpack_cluster_hardware_architecture.

Each node will have different software installed based on its role. For example the Galaxy nodes will have galaxy specific files and mounts and the GPU nodes will have the GPU drivers installed.

Login nodes are access points to the whole cluster, they contain most of the development libraries required for a user to develop/test/run programs

Compute nodes are primarily for execution of jobs and any development on them is discouraged.

:!: For GPU development, it is best to compile your binary on one of the login nodes located in your ClusterHome, and submit the job to an epsilon node via the SGE.

From a node, other nodes can be viewed and accessed.

$ ssh gamma00
$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@delta-5-1.local          BIP   0/0/0          0.07     lx26-amd64    
---------------------------------------------------------------------------------
all.q@epsilon-0-24.local       BIP   0/0/64         0.00     lx26-amd64    
---------------------------------------------------------------------------------
all.q@epsilon-0-25.local       BIP   0/0/64         0.00     lx26-amd64    
---------------------------------------------------------------------------------
all.q@gamma00.local            BIP   0/0/0          0.00     lx26-amd64    
---------------------------------------------------------------------------------
all.q@gamma01.local            BIP   0/0/0          0.00     lx26-amd64    
---------------------------------------------------------------------------------
all.q@gamma02.local            BIP   0/0/0          0.00     lx26-amd64    
---------------------------------------------------------------------------------
all.q@omega-0-1.local          BIP   0/0/64         0.00     lx26-amd64    
---------------------------------------------------------------------------------
all.q@omega-0-10.local         BIP   0/0/64         0.00     lx26-amd64    
---------------------------------------------------------------------------------
all.q@omega-0-11.local         BIP   0/0/64         0.00     lx26-amd64    

$ ssh omega-0-1
etc..

Queues

<queue>.q prefix refers to a SGE queue, this is useful for scheduling jobs to the cluster via the SGE

ClusterShare and ClusterHome

ClusterHome is not a specific folder, but refers to your home directory (cd ~/) for when you are logged in to any of the cluster nodes.

Essentially, all files in your home folder is “synced” across all the nodes, so it is a good point to access the files between nodes. Note that ClusterHome IS NOT BACKED UP.

More permanent sharing between clusters is ClusterShare.

ClusterShare is for sharing of biodata and software between cluster nodes as well as users.

User data and software can be stored in ../contrib/ folders:

/share/ClusterShare/biodata/contrib
/share/ClusterShare/Module/modulefiles/contrib
/share/ClusterShare/software/contrib

:!: Files (software, libraries etc) outside of /contrib/ folders are managed by the PWBC Admin. These are generally used by Galaxy, but are free to use by users.

Software and Modules

Additional software and libraries can be added by the user (eg in their home directory or ClusterShare/software/).

However to encourage sharing and to reduce redundancy the use of Modules is HIGHLY recommended.

Modules work like packages of libraries or binaries. For example, when you are working on openmpi, you can load the openmpi libraries via:

module load openmpi-x86_64

When you load a module, your environment variables are changed to include libraries or binaries specified by the Module parameters. Later to unload:

module unload openmpi-x86_64

This modularity helps keep the cluster organised and manageable!

:!: PWBC will provide more common modules such as R, perl, cudatoolkit etc overtime.

Users are encouraged to make their own modules to keep their work clean and tidy. Sharing is also very easy via the contrib folder (see below) however be conscious of the risks to others (when people use your module) as well as to yourself (using other people's modules).

For examples, look through:

Getting started with Modules

First add (to your .bashrc) the module locations to your $MODULEPATH, eg:

export MODULEPATH=/share/ClusterShare/Modules/modulefiles/noarch:/share/ClusterShare/Modules/modulefiles/centos6.2_x86_64:/share/ClusterShare/Modules/modulefiles/contrib:$MODULEPATH

To see available modules,

$ module avail

------------------------------ /share/ClusterShare/Modules/modulefiles/noarch ------------------------------
acml/gfortran64/4.4.0        acml/gfortran64_mp/4.4.0     acml/open64_64_fma4_mp/5.1.0

------------------------- /share/ClusterShare/Modules/modulefiles/centos6.2_x86_64 -------------------------
R/gcc-4.4.6/2.15.0     cloudbiolinux/default  open64/4.2.5.2         perl/gcc-4.4.6/5.14.2
boost/gcc-4.4.6/1.49.0 cudatoolkit/4.2.9      open64/4.5.1           python/gcc-4.4.6/2.6.6

----------------------------- /share/ClusterShare/Modules/modulefiles/contrib ------------------------------
kevyin/test

-------------------------------------- /usr/share/Modules/modulefiles --------------------------------------
dot           module-cvs    module-info   modules       null          rocks-openmpi use.own

--------------------------------------------- /etc/modulefiles ---------------------------------------------
openmpi-x86_64

To load a module, (supports <tab> autocomplete)

$module load cudatoolkit/4.2.9

Issues with compilation

To minimise cruft and to keep the cluster tidy for everyone, it is necessary that users build and manage the software they use themselves. However, in the case where root privilege is required, contact the PWBC Admin.

Running jobs with the Grid Engine

http://wiki.gridengine.info/wiki/index.php/Main_Page

Jobs can be submitted to the cluster via submission hosts, an example would be the gammaXX nodes (also login hosts).

Submission can be either done through

  • the command line: qsub (see man qsub)
    • The library is here
      • export DRMAA_LIBRARY_PATH=/opt/gridengine/lib/lx26-amd64/libdrmaa.so

Useful qsub commands

# -V          Environment variables are exported to the context of the job
# -j n        Don't merge stdout and stderr
# -R y        Reservation for this job
# -pe smp <N> N Number of cores
# -wd         Set working dir
# -o          stdout file
# -e          stderr file
# -b y        Treat command as binary or script
qsub -q $queue -V -j n -R y -pe smp $cores -wd $working_dir -o $std_out -e $error_out -b y $script_or_command

qdel               # delete a job
qstat              # show your current jobs
qstat -s z         # show past jobs
qstat -f           # show your jobs by node
qstat -j
qstat -f -u "*"    # show every job

# Simple one liner
qsub -cwd -b y echo hello

:!: There is a quirk with Module and the SGE where you may get this in your error output, so far we have not seen this interrupt the actual job.

/bin/sh: module: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `module'

Sample qsub execution script

Replace values in <..>

#!/bin/bash -l

module load <MODULES_TO_LOAD>

# SETUP
base_dir=<PROJECT_DIR>

date_dir=`date +%G_%m%d_%H%M%S`

output_dir="${base_dir}/output/${date_dir}/"
mkdir -p $output_dir

echo "Output dir: $output_dir"
cores=1

# stdout and stderr locations
# make use of qsub's pseudo environment variables
error_out=${output_dir}/e\$JOB_ID
std_out=${output_dir}/o\$JOB_ID
output_file="${output_dir}/_result"
working_dir=${output_dir}/
mkdir $working_dir

script_or_command="echo greetings"

#cmd="qsub <QSUB_OPTIONS> <PROGRAM_SCRIPT_OR_SHELL_COMMAND>"
cmd="qsub -q all.q -V -j n -R y -pe smp $cores -wd $working_dir -o $std_out -e $error_out -b y $script_or_command"
echo $cmd
$cmd


Sample qsub script
#!/bin/sh
#
# sge_launcher.sh
# NOTE: lines begining with #$ are actual qsub options
#
# Specify the queue to run on
#$ -q all.q
# Use current working directory
#$ -cwd
# Indicates a binary will be executed 
#$ -b y
# This is the actual command for the job
echo "This is a job runs on a SGE cluster."
chmod u+x ./sge_launcher.sh
qsub ./sge_launcher.sh

FIXME