News Brief

24/10/2012 - Wolfpack Cluster's launched! Check out the User Guide now!

Wolfpack Cluster User Guide

Garvan HPC

General Info

Wolfpack Cluster is based on CentOS 6.2 Linux. It consists of 20 Dell PowerEdge C6145 servers. The hardware overview is summarized as below:

Component Features Hardware Details (Per Compute Node, Per GPGPU)
Wolfpack Cluster 20 x Dell PowerEdge C6145 4 x 16-core AMD Operon 6262SE, 512GB Memory, 6 x 900GB SAS
GPGPU 4 x NV Tesla M2075 Fermi 6GB, Compute Capability 2.0, 448 cores
Gagri NAS Network-Attached Storage BlueArc Mercury 100, 100TB SATA, 2.5TB SSD

Wolfpack's resource utilization can be monitored via its Ganglia interface.

Access to Wolfpack

Wolfpack can be accessed via its dedicated login nodes. A login node is the designated host where a user can:

  • manipulate input/output files for the jobs
  • develop, compile and validate code for jobs
  • submit jobs to the SGE

Any authorized user can ssh to one of the following login nodes:

Login Host IP
gamma00.garvan.unsw.edu.au 129.94.136.3
gamma01.garvan.unsw.edu.au 129.94.136.12
gamma02.garvan.unsw.edu.au 129.94.136.4

:!: Running jobs on a login node is meant for testing purposes ONLY. Do NOT launch any intensive jobs (including MPI/Hadoop jobs) directly on a login node.
:!: Use the cluster's resources properly by submitting intensive jobs via the SGE.
:?: Don't have access? Please send query to PWBC Admin. We will setup your access ASAP.

Cluster Storage

Wolfpack Cluster supports access to the Gagri File Server via NFS protocol. Two volumes have been mounted for: personal space & cluster sharing. All these volumes are available across the entire cluster in exactly the same location.

Volume Mount Point Backup Quota Hardware Purposes
ClusterHome /home/USER_ID NO 600GB/User HITACHI NAS User home directory, stores a user's input/output files temporarily
ClusterShare /share/ClusterShare Yes 2TB/Group HITACHI NAS Cluster sharing between all users, software, biodata, modulefiles and common group data

:!: A volume is mounted on demand and will be automatically unmounted after a certain period of idle time. THAT IS WHY, the mount point may not be visible, thus tab completion may not work and entering full path is needed.

Temp/Scratch Spaces

Volume Mount Point Backup Quota Hardware Note
ClusterScratch /share/ClusterScratch NO 1TB/User PanFS General-purpose scratch space. Regularly purge 30+ days old data
Gagri Temp /share/Temp NO unlimited HITACHI NAS 4.2TB for the entire cluster. Regular purge fortnightly
Local Temp /tmp NO unlimited SATA RAID0 3.3TB per node

:!: On ClusterScratch, please use either your LDAP user (e.g. derlin) or group (e.g. FacilityBioinformatics) as the directory name.

PanFS Utilities

A set of PanFS utilities are available on the system for querying the file system. Most of these commands are available to normal users.

#
# Check your current quota on a PanFS volume:
#
USER@omega-0-0:home$ panfs_quota /share/ClusterScratch

  <bytes>       <soft>        <hard> : <files>    <soft>    <hard> :      <path to volume> <pan_identity(name)>
    32768 600000000000 1000000000000 :       1 unlimited unlimited : /share/ClusterScratch uid:6xxx(USER)
#
# Check disk usage on given directory:
#
USER@omega-0-0:home$ pan_du /share/ClusterScratch/test
dir /share/ClusterScratch/test: 1 files, 32 KB

#
# Seems to be powerful a parallel cp... feel free to try...
#
USER@omega-0-0:home$ pan_pcopy
USAGE: pan_pcopy [options] source [source ...] dest

 Required:
  -w, --worker-hosts=val  Space separated list of host names or IP Address ranges of hosts that can read from the source dirs and write to the destination dir

 Options:
  -b, --buffer-size=val   Buffer sizes to use in file copies
  -c, --chunk=val         Chunk size determines the unit of parallel tranfers
      --defaults          Show program defaults
  -h, --help              Display usage message
  -n, --num-threads=val   Num of worker threads per host
  -p, --preserve-all      Preserve mode, ownership, and timestamp attributes
      --preserve-mode     Preserve mode attributes
      --preserve-own      Preserve ownership attributes
      --preserve-time     Preserve timestamp attributes
  -s, --sparse-files      copy the sparse files
  -t, --timeout=val       Specify the timeout value in seconds
  -u, --user=val          User-id to use on worker hosts
  -v, --verbose           Report verbosely on every operation

Cluster Sharing Space

ClusterShare is the designated volume for sharing all kinds of data, such as software, biodata, modulefiles and much more. PWBC has created the following top level directories:

Directory Purpose Non-root User Writable
biodata Bio reference data Yes: biodata/contrib
Modules Definition files used by Environment Modules Yes: Modules/modulefiles/contrib
software Software that compiled by hand Yes: software/contrib
thingamajigs anything 8-) Yes: the rule is no rule in here. Common group data etc

:!: Users are solely responsible for what they put in. They are expected to put the data under the right category (e.g. put tool binaries under software/contrib).

How to Transfer Files on Wolfpack

Users can transfer files to their cluster home directories via scp, sftp (or a program such as WinSCP). Transfer can be done by connecting to one of the login hosts from any computer anywhere. This is the most accessible, stable method that PWBC officially supports.

Access CIFS Volumes on Wolfpack

:!: This is *NOT* an official method for data transfer on Wolfpack, it is provided “as is”. Use at Your Own Risk.
:?: If you cannot locate your resources from the network paths below, please check with IT.

Network Path Contents
//Pandora/Personal home directories on Pandora
//Pandora/Volumes group volumes on Pandora
//Gagri/GRIW group volumes on Gagri
#
# On a Login host only (e.g. gamma00).
#
user@gamma00:~$ smbclient //Pandora/Personal -U GARVAN_ID
Enter GARVAN_ID's password:
Domain=[Garvan] OS=[BlueArc Mercury 8.1.2353.06] Server=[BlueArc Mercury 8.1.2353.06]
smb: \> help
#
#
#
user@gamma00:~$ smbclient //Pandora/Volumes -U GARVAN_ID
Enter GARVAN_ID's password:
Domain=[Garvan] OS=[BlueArc Mercury 8.1.2353.06] Server=[BlueArc Mercury 8.1.2353.06]
smb: \> ls

To avoid entering user ID and password every time, a credential file can be used.

#
# Format of the credential file. Make certain it's restricted!!!
#
username = GARVAN_ID
password = xxxxxxx
domain = Garvan
#
# Run smbclient with credential file provided:
#
smbclient //Pandora/Personal -A=PATH_TO_FILE

smbclient is ftp-like. Many commands with the same name can be used:

#
# Some useful commands:
#
# List directory of the current remote location
smb: \> ls
# List directory of the current local location
smb: \> !ls
# Disable annoying prompt
smb: \> prompt
# Download multiple files FROM the remote location
smb: \> mget *.bam
# Upload multiple files TO the remote location
smb: \> mput *.bam
# Remove file(s) in the current remote location
smb: \> rm LID47251_BC0D61ACXX_7.bam
# Show current remote location
smb: \> pwd
# Show current local location
smb: \> !pwd
#
# Download a entire directory:
# NB: prompt turns off the download prompt for each file.
#
smb: \> prompt
smb: \> cd PARENT_DIR
smb: \> recurse
smb: \> mget DIR_TO_COPY

:!: If you are given NT_STATUS_ACCESS_DENIED error, please contact IT. PWBC has no privilege of troubleshooting any permission related issue.
:!: smbclient could not handle the directory or file name that contains space character.

Cluster Software

Supported software on Wolfpack is conventionally installed in the following places. This software (except the contrib directory) is supported by PWBC staff and will be maintained with reasonable diligence. If you require upgrades that are available but not yet supported on PWBC, please make a request to PWBC or consider a private installation in your home or contrib area.

If you need software that is not found or is not installed on Wolfpack, please send a request to PWBC, specifying the package, version, and relevant licensing info, if any. Users may install proprietary, privately licenced, and open source software not currently available on Wolfpack in their home directories or in /share/ClusterShare/software/contrib.

Location Features
/opt/bio Bio tools installed on each node locally by Rocks Bio Roll
/share/ClusterShare/software/centos6 Software that installed/compiled by PWBC on CentOS 6
/share/ClusterShare/software/cloudbiolinux-centos6 Software stack from CloudBioLinux complied by PWBC on CentOS 6
/share/ClusterShare/software/noarch Software without need for compilation, just unpack
/share/ClusterShare/software/contrib Software that are installed and maintained by the user community

:!: Many supported tools are being used by the production Galaxy and GenePattern, thus modifying or upgrading such tools is time consuming due to the strict installation and validation process.
:!: Managing binaries and environment variables can potentially be very messy, we HIGHLY recommend you use the Modules package (described below) to manage/share your software

Software Environment Management

Wolfpack Cluster uses the Environment Modules package to control users' environment settings. Below is a brief discussion of its common usage on Wolfpack. You can learn more at the Modules home page.

Overview

The Environment Modules package provides for dynamic modification of a user's shell environment. Module commands set, change, or delete environment variables, typically in support of a particular application. They also let the user choose between different versions of the same software or different combinations of related codes.

For example, if the open64 module and acml module are loaded and the user compiles with openf90, the generated code is compiled with the Open64 Fortran 90 compiler and BLAS utilizing the ACML are linked. By unloading the acml module, loading the atlas module, and compiling with openf90, the Open64 compiler is used but linked with the ATLAS.

Modules on Wolfpack

By default, only module rocks-openmpi is loaded at login time.

In order to use modules from PWBC, MODULEPATH must be configured (eg. in .bashrc):

export MODULEPATH=PATH_TO_MODULEFILE:$MODULEPATH
Modulefile Path Managed By
/share/ClusterShare/Modules/modulefiles/noarch PWBC
/share/ClusterShare/Modules/modulefiles/centos6.2_x86_64 PWBC
/share/ClusterShare/Modules/modulefiles/contrib Community

Useful Modules Commands

Here are some common module commands and their descriptions:

  • module list - List the modules that are currently loaded
  • module avail - List the modules that are available
  • module display <module_name> - Show the environment variables used by <module name> and how they are affected
  • module unload <module name> - Remove <module name> from the environment
  • module load <module name> - Load <module name> into the environment
  • module switch <module 1 name> <module 2 name> - Replace <module 1 name> with <module 2 name> in the environment

Note that you must remove some modules before loading others due to conflicts. For example, if the acml 4.4.0 is loaded, you must unload it before you can load acml 5.1.0. Also, some modules depend on others so may be loaded or unloaded as a consequence of another module command.

:?: Currently, automatically load/unload a module's dependencies does not seem to work with (–force). One solution is including module loads in your .bashrc. Alternatively create a modulefile that loads other modules

mkmodule

marcow wrote a module called mkmodule. after you've compiled some code that you want to convert into a module, ensure that MODULEPATH contains /share/ClusterShare/Modules/modulefiles/contrib/marcow then try:

module load mkmodule
mkmodule <software_name> <software_version> [<compiler>]

it then copies everything in current working dir to /share/ClusterShare/software/contrib/$USER/<software_name>/<compiler>/<software_version>/ then creates & prompts you to edit a modulefile.

If your current wd is messy, make a clean subdir & deploy the necessary bits from there:

mkdir -p ./deploy/{bin,share/man/man1}

copy your binaries to ./deploy/bin, copy the man files to ./deploy/share/man/man1

find . -type f -maxdepth 1 -executable -exec cp {} ./deploy/bin \;
find . -type f -madepth 1 -name '*.1' -exec cp {} ./deploy/share/man/man1 \;
cd ./deploy
mkmodule <software_name> <software_version> [<compiler>]

Be a good module developer & include the man pages if they're available!

OS Software

OS software is referred to those available in the CentOS repositories and installed via the yum command. Generally both login and execution nodes contain the same set of packages. On top of that, a login node also contains the development packages for software development. For example, libxml2 package is installed on both login and compute nodes. But libxml2-devel exists only on login nodes.

Users can determine the installation status of packages by (no root is required, supports wildcast):

yum info libxml2*

:!: Installing OS software requires yum command as root. Please contact PWBC if you need new OS software.

Setting up your own environment of some popular Software

You will need to do this to install your own modules/packages

Other Software

Biodata, Indices, Annotations

Below are the genesets/indices/annotations that are publicly available to everyone. Please check this out thoroughly before downloading your own copy. It saves you time and disk space after all.

Source Version Location Maintainer Comment
GENCODE v1 - v16 /share/ClusterShare/biodata/contrib/GENCODE Xiu Cheng Quek More Info

:!: Contact the maintainers for any concern or recommendation, they are committed to this!

Dedicated Data Maintainers

We have a group of volunteers to maintain high quality of public data on the cluster. Please contact them for any relevant question.

Name Group
F. Buske EpigeneticsResearch
M. Pinese PancreaticCancerResearch
X.C. Quek GenomeInformatics
J. Ho VCCRI

Sun Grid Engine

Wolfpack uses Sun Grid Engine (SGE) as its Distributed Resource Management System. The current version is SGE6.2u5p2.

The Queues

Queue Purposes ACL Quota Limit
all.q default queue, AMD processor, ideal for multithreading and parallel jobs all 192 slots/user n/a
bob.q subordinate queue of all.q, jobs are suspended if all.q is getting too busy, ideal for interactive jobs all consumes all.q quota n/a
intel.q Intel processor, good performance as per core, used by Galaxy's single-thread tools admin n/a n/a
net.q Composed of Login hosts that feature public network interfaces, created specifically for Galaxy & GenePattern tools admin n/a n/a

Jobs Suspension on bob.q

Jobs in bob.q have lower priority than those in its parent queue. Therefore, the system will suspend those jobs whenever all.q is getting too busy.

Jobs suspension is triggered by slots utilization threshold and happens on per host basis. Whenever jobs from all.q are utilizing 75+% slots of a host, bob.q jobs on that host will start getting suspended.

Parallel Environment (PE)

For use with job submission: eg qsub -pe. PEs are available on all.q unless specified.

PE Purposes Note
mpi do *NOT* use not recommended
mpich for MPICH2 jobs, tight integration with SGE
orte for Open MPI jobs, tight integration with SGE
orte_rr same as orte, except it uses round robin rule
smp for any classic multithreading job that implements OpenMP, python multiprocessing etc
bob_smp same as smp, bob.q ONLY

:?: What does tight integration mean? Read here, here and here.
;-) To check what PEs are attached to a queue, run qconf -sq all.q | grep -i pe_list.

GPGPU Compute via SGE

Wolfpack equips several GPGPU devices on the following hosts:

Host GPGPU Slots Vendor Note
epsilon-0-24 2 CUDA
epsilon-0-25 2 CUDA

To submit a CUDA job that requires one CPU core and one GPGPU:

qsub -q all.q -l cuda=1 [OTHER_OPT] CMD

To submit a CUDA job that requires one CPU core and two GPGPUs:

qsub -q all.q -l cuda=2 [OTHER_OPT] CMD

To submit a CUDA job that uses multiple cores and GPGPUs:

qsub -pe smp 8 -l cuda=2 [OTHER_OPT] CMD

Module Issue with SGE

Whenever option -V is supplied in qsub command, the following message appears:

bash: module: line 1: syntax error: unexpected end of file
bash: error importing function definition for `module'

It's believed that SGE 6.2u5 cannot handle new lines in the environment, it's a known bug:

$ env
...
module=() {  eval `/usr/bin/modulecmd bash $*` <-- the bad guy
} <-- the bad guy 
...

and you should see the module variable at the bottom is defined with a new line. In general, this error can be ignored and causes no trouble to the jobs. The issue is discussed here. If is really needed, it can be eliminated in two ways.

Method 1: Unset Module Variable

So basically before submitting jobs (e.g. do qsub), do an

$ unset module

Example:

#!/bin/bash
#
# So load the modules first, then get rid of module env.
#
module load kevyin/python/2.7.2
unset module
 
qsub -cwd -V -b y testscript.sh

:!: This method can be used in a script that calls qsub.

Method 2: Source Module Env

Example:

#!/bin/sh
#
# Fix the module env, then load the modules.
#
# sge_launcher.sh
#
# Specify the queue to run on
#$ -q qa.q
# Use current working directory
#$ -cwd
# Indicates a binary will be executed 
#$ -b y
# This is the actual command for the job
 
. /etc/profile.d/modules.sh
module load python
module load R
 
R ....

:!: This method should be applied to a SGE submission script.

Other Refs:

SGE Advance

SGE Cheat Sheets

Cluster Web Server

Content Path URL Example Enabled
HTML Page /home/$USER/public_html pwbc.garvan.org.au/~$USER pwbc.garvan.org.au/~$USER/my_page.html Global Default
CGI Code /home/$USER/public_html/cgi-bin pwbc.garvan.org.au/~$USER/cgi-bin pwbc.garvan.org.au/~$USER/cgi-bin/example.pl Global Default

Please follow the steps below for setting up public_html:

#
# Permissions have to be EXACTLY the same as below.
# Do NOT give 777 as shortcut!! It won't work!
#
chmod 711 ~

mkdir ~/public_html
chmod 755 ~/public_html
nano my_page.html

mkdir ~/public_html/cgi-bin
chmod 711 ~/public_html/cgi-bin
nano ~/public_html/cgi-bin/example.pl
chmod 755 ~/public_html/cgi-bin/example.pl

To disable public_html completely, the best way is set home directory permission to 700:

chmod 700 ~

:!::!: This feature is provided only for occasional data sharing with the externals. The web contents are limited by home directory's 600GB quota. It's not a solution for a dedicated online large scale data repository or anything similar.