TIGER
Wimpy
 
Wimpy is a set of tools enabling migration of processes across the cluster. It relies on Berkely Labs Checkpoint/Restart (BLCR) kernel modules and libraries which enable checkpoint of a running process to an image file and its subsequent restart.

The system is governed by wimps daemon which runs on the headnode and keeps complete information about the cluster (nodes and processes) status. Subsequently, it determines placement of registered processes on individual nodes. By means of migrations, internal scheduler continuously tunes CPU usage of the processes so that it corresponds to their value of nice. Scheduler respects different performance of individual processors -- all time values are renormalized to 2GHz processor.

On each client node there runs a wimpc daemon which performs manipulation of locally runing jobs (checkpoint, restart, signal forwarding) and sends regularly information about their status to the wimps daemon.

Migration of processes from one node to another is performed by checkpoint (with signal SIGKILL) to and subsequent restart from an image file in a directory mounted on all nodes via NFS. For proper functionality also the /home volume has to be mounted cluster-wide. In addition, structure of directories containing system-wide installed libraries should be identical on all nodes. On Tiger cluster we do this by exporting of /usr/lib via NFS.

Process image files created during migration are kept as a backup for the case of system crash. Moreover, backup checkpoints are performed for processes that have not been migrated within certain period (approximately one hour).

Last updated: 13.7.2005 (L.)