Platform:    hal.cims.nyu.edu
(Numerical Laboratory - EFDL)




Specifications



User's Nodes, Queues, and Disks

In an effort to balance the use of the hal cluster:

user login node disk space primary pool secondary pool(s)
jonjon (Andy Majda) hal01 /hal01.tmp/majda A all nodes but hal02, hal03
crommelin (Daan Crommelin) hal01 /hal01.tmp/crommelin A "
holland (David Holland) hal02 /hal02.tmp/holland B "
cshaji (C. Shaji) hal03 /hal03.tmp/cshaji B "
epab (Povl Abrahamsen) hal03 /hal03.tmp/epab B "
buhler (Oliver Buhler) hal04 /hal04.tmp/buhler C "
reddy (Tasha Reddy) hal04 /hal04.tmp/reddy C "
tabak (Esteban Tabak) hal05 /hal05.tmp/tabak D "
tea (Todd Arbetter) hal05 /hal05.tmp/tea D "
grote (Marcus Grote) hal05 /hal05.tmp/grote D "
kleeman (Richard Kleeman) hal06 /hal06.tmp/kleeman E "
ytang (Youmin Tang) hal07 /hal07.tmp/ytang E "
pauluis (Olivier Pauluis) hal07 /hal07.tmp/pauluis E "
shafer (Shafer Smith) hal08 /hal08.tmp/shafer F "
xuemin (Xuemin Tu) hal08 /hal08.tmp/xuemin F "
givelbrg (Ed Givelberg Smith) hal08 /hal08.tmp/givelbrg F "
desteur (Laura de Steur) hal09 /hal09.tmp/desteur B "
abramov (Rafail Abramov) hal10 /hal10.tmp/abramov A "
franzke (Christian Franzke) hal10 /hal10.tmp/franzke A "
kleeman (Richard Kleeman) hal11 /hal11.tmp/kleeman E "
jenkins (Adrian Jenkins) hal12 /hal12.tmp/jenkins B "
thoma (Malte Thoma) hal13 /hal13.tmp/thoma B "
smedsrud (Lars Smedsrud) hal13 /hal13.tmp/smedsrud B "
kleeman (Richard Kleeman) hal14 /hal14.tmp/kleeman E "
barreiro (Andrea Barreiro) hal15 /hal15.tmp/barreiro G "
dgoldberg (Dan Goldberg) hal15 /hal15.tmp/dgoldberg G "
konigc (Chris Konig) hal15 /hal15.tmp/konigc G "
tulloch (Ross Tulloch) hal15 /hal15.tmp/tulloch G "
walkerr (Ryan Walker) hal16 /hal16.tmp/walkerr G "
schaffri (Helga Schaffrin) hal16 /hal16.tmp/schaffri G "
saverio (Saverio Spagnolie) hal16 /hal16.tmp/saverio G "


Backups

No files on hal are currently backed up! It is the user's responsibility to have all critical files copied to some other system where backups are performed (e.g., math1.cims.nyu.edu).


Documentation


Queues

To run "q" commands (qstat, qsub, etc.), those users whose GID (group-id) is not already defined in the YP (yellow-pages) groups will need to run the following command in a window on one of the hal nodes before running one of the "q" commands.

newgrp caosgrp

The user should then be able to run "q" commands in the window from which that command was issued (or any child process windows).

Users whose GID is already defined in the YP groups will not need to run the newgrp command. This would be true of any users whose GID is 1000,4000,5000, or 6000. Many users, however, have GIDs assigned which are the same as their UIDs -- they would have to run the newgrp command. The best way to find out is to simply try.

To submit a job to a queue, create a shell script with the commands you wish to have executed. On the hal system, you should include some coding, such as the following, at the very beginning of your shell script:

#-----------------------------------------------------------------------
# switches: SUN queue directives
#           (leader characters are '#$ ')
#-----------------------------------------------------------------------
#
# set shell as bourne, csh, etc
#$ -S /bin/your_favorite_shell
# define user queue 
#$ -q your_hal_node.q
# execute in current working directory
#$ -cwd
# export environmental variables
#$ -V
# job name
#$ -N your_job_name
# send standard output to specific file
#$ -o $JOB_NAME.$JOB_ID
# merge error output with standard output
#$ -j y
# define list of users for email notification
#$ -M your_user_name@cims.nyu.edu
# send email on job end and suspension
#$ -m es
#
#----------------------------------------------------------------------

To execute your commands, you submit your job script via the qsub command. More details are found in the man pages for qsub.

As an example of submiting a job to the appropriate primary and secondary queues, user holland would submit qsub -q hal02.q, hal03.q, hal09.q, hal12.q, hal13.q, hal01s.q, hal04s.q, hal05s.q, hal06s.q, hal07s.q, hal08s.q, hal10s.q, hal11s.q, hal14s.q, hal15s.q, hal16s.q (or some subset thereof).

To check the status of your job script, you issue the qstat command. Again, more details are found in the appropriate man pages. The qstat command informs you of the various jobs running on the hal system. Currently, the queues available are identified as hal#.q. If you do not explicitly specify your queue, the system picks one for you from your group (A or B).

To remove an undesired job from the system queue, issue the qdel command. More details are found in the appropriate man pages.


Distributed Make

Waiting for the compilation of all the individual program fragments of a large program can be an onerous task. The use of the dmake command can cause a makefile to be distributed over any number of the hal nodes, thus markedly speeding up the compilation process. See the man pages for dmake for further information. To enable dmake on hal, follow these steps:

  1. Create a .dmakerc file in your home directory that contains the name of the nodes on which you want your dmake to run. Here is an example .dmakerc for the hal cluster.
  2. Have rsh login access to all the hal nodes. You will need to have a .rhosts file. In the example provided, you need to substitute your actual user name for your_usr_name. Additionally, the space between the platform name and the user name, must either be a single blank space or a single tab space.
  3. Test that you will be able to execute rsh by issuing a test command, for example, rsh hal01.cims.nyu.edu date


Debugging

The SunOS includes a debugger with a GUI interface. It is invoked by issuing the command prism a.out. It assumes that, for FORTRAN, you have built your executable, i.e., a.out, using parallel compilation (even if you only intend to run your code in serial fashion). To compile any file in your code in parallel mode, issue the command mpf95 -c your_file.f. You must link all your object files into a single executable using the command mpf95 -o a.out *.o.


Graphics

Some popular graphics packages available on the hal cluster are:


Data Formats

The (64-bit) netCDF library libnetcdf.a is available under the path /usr/local/pkg/netcdf-64/lib. The Fortran-90 module netcdf.mod is available under the path /user/local/pkg/netcdf-64/src/f90. Details on using netCDF in a Fortran-90 programming environment are available here.

The most up-to-date verison of netCDF can be downloaded here. For the hal system, the user must set the following environment variable prior to running the ./configure script:

Also, the ncview command is available under the path /usr/local/bin. This command is suitatble for giving a quick look at the contents of a netCDF file. Further details are available here.


Parallelization

Some parallelization techniques available on the hal cluster are:


.
© David Holland.
All Rights Reserved.
If you would like further information
concerning any of the above topics
please send email
.