All programs running on the cluster must be compatible with 64-bit system
libraries so that they could be migrated among the nodes. This means that all
programs should be compiled locally -- there are C, C++ and F95 compilers from
GCC suite 4.4.7 available.
Please, feel free to contact Tiger's administrator if you suffer
lack of some tools or libraries needed to compile and run your programs.
Migration of processes among the cluster nodes poses some further limitations:
- program has to be dynamically linked
- program's and linked libraries' binaries must not be changed during the run
- program should not rely on standard input and output (see below)
- program must not try to access files outside the /home directory
- program must run in a single thread
All computational jobs have to be registered to the clustering software Wimpy
which controls their location on the cluster nodes. This is achieved by running
programs via utility mpirun:
user@tiger: ~ > mpirun my_program arg1 arg2 ...
Jobs are automatically detached from the calling terminal; therefore, it is not
necessary to use & sign for execution on background. There are
several options how to start the program via
mpirun, which should be used acording
to the type of the process. Please, check the manual pages
and rules sections for details.
When starting a large number of jobs at once, it is better to start them immediately
one by one, which minimises number of scheduler invocations. On the other hand, all
jobs are being started on one specific node and it is important not to overload it.
Currently, it is considered to be safe to start < 100 jobs at once provided their
total memory consumption does not exceed 64GB. Subsequent large batch of jobs should
only be started after the previous one has ben moved away from the starting node
(see page Hardware for its ID number). Typically, this should
happen no longer than 10 minutes after their start.
Don't start programs from the Midnight Commander command line. This may lead to
failure during manipulation with the program's file descriptors and, consequently,
to its unexpected behavior.
Standard input & output
When invoked by mpirun, the program's standard input, output and error
output (file descriptors 0, 1 and 2) are by default redirected to /dev/null,
i.e. you will never see any output on terminal, nor will you be able to interact with
the program via keyboard. All standard file descriptors can be redirected to
regular files via options of the mpirun
command. Standard output can be then monitored with tail -f, e.g.:
user@tiger: ~ > mpirun --stdout=my_output my_program
user@tiger: ~ > tail -f my_output
Registering of a new job invokes scheduler which places jobs on individual nodes.
This usually leads to enhanced migration of processes. The same happens after the
job finishes. Instead of e.g. running many individual jobs with different
arguments, users should, if possible, run one job with internal loop. A limited
munber of short lived jobs (no more than ten simultaneously) can be run directly on
the headnode Sirrah, i.e. without use of the mpirun utility.
Never run wrapping script via mpirun! Following example is strongly forbidden. In
this way, only the script.sh which, however, finishes immediately would
be registered to the system, while CPU-time consuming tasks would remain
hidden to Wimpy and could not be distributed across the cluster: