Wimpy is a set of tools enabling migration of processes across the cluster.
It relies on Berkely Labs
Checkpoint/Restart (BLCR) kernel modules and libraries
which enable checkpoint of a running process to an image file and its subsequent
restart.
The system is governed by
wimps daemon which runs on the headnode and keeps complete information
about the cluster (nodes and processes) status. Subsequently, it determines
placement of registered processes on individual nodes. By means of migrations,
internal scheduler continuously tunes CPU usage of
the processes so that it corresponds to their value of nice. Scheduler
respects different performance of individual processors -- all time values are
renormalized to 2GHz processor.
On each client node there runs a wimpc daemon which performs manipulation
of locally runing jobs (checkpoint, restart, signal forwarding) and sends regularly
information about their status to the wimps daemon.
Migration of processes from one node to another is performed by checkpoint
(with signal SIGKILL) to and subsequent restart from an image file in a directory
mounted on all nodes via NFS. For proper functionality also the /home
volume has to be mounted cluster-wide. In addition, structure of directories
containing system-wide installed libraries should be identical on all nodes.
On Tiger cluster we do this by exporting of /usr/lib via NFS.
Process image files created during migration are kept as a backup for the
case of system crash. Moreover, backup checkpoints are performed for processes that
have not been migrated within certain period (approximately one hour).
|