Test of integrators (compilers, parallel versions, new 2nd order)

Table of contents:

Typical test run (if not specified): 4+50 particles, timestep = 20 days, stop time = 0.2 Myr. (The initial conditions were the same as in eos-j94_1.) You can also see a(t) plots for individual runs (all look very similar, as they should).

  1. Reference run with BS integrator for comparison of individual TP orbits, computed on Cel/1 GHz.

    run directory run-time [sec] notices
    bsy_g77 10822

    Bulirsch-Stoer integrator (with Yarkovsky subroutines) is approx. 7 times slower than RMVSY.

  2. The same integrator swift_rmvsy, computed on PIII/1 GHz, but 5 different Linux compilers.

    rmvsy_g77 1652 g77-0.5.24
    rmvsy_omf77 1496 Omni-1.3 (http://phase.etl.go.jp/Omni/)
    rmvsy_fujitsu 1371 Fujitsu F95 (compiled by Menios Tsiganis)
    rmvsy_lf95 1147 Fujitsu/Lahey F95 Express 6.0
    rmvsy_pgf90 1133 Portland group Fortran90 (1164 sec with -mp on 1  CPU)

    Compiler omf77 is faster than g77 by approx. 10 %. Fujitsu/Lahey and Portland group compilers beat g77 by approx 30 %.

  3. PARALLEL program swift_rmvsy on 1 and 2 processors, computed on PIII/800 MHz. (See http://sirrah.troja.mff.cuni.cz/~mira/mp/ for downloads. The specification of OpenMP http://www.openmp.org.)

    rmvsy_omf77_ida1 1857 run only on 1 CPU
    rmvsy_omf77_ida2 1263 2 CPUs (SMP architecture)

    On two processors the RMVSY is (only) 1.5 faster than on single one.

  4. MVS integrator without any regularization (no close encounters!), also compared with integration involving one half of test particles.

    mvsy_omf77_ida1 1613 run only on 1 CPU (50 TPs)
    mvsy_omf77_ida2 1044 2 CPUs (SMP architecture)
    mvsy_omf77_ida1_1 846 1/2 of TPs

    mvsy_omf77_ida500_1 1537 1 CPU, NTP = 500, tstop = 0.02 Myr
    mvsy_omf77_ida500_2 945 2 CPUs

    MVSY is only 1.2 times faster than RMVSY due to Yarkovsy part of the code (as compared to factor 2 between pure MVS and RMVS3).

    On 2 CPUs the integration takes 1.55 times shorter time span. (In case of 10 times larger number of TPs, ie, 500, the ratio is even slightly better: 1.62.)

    When you manually split the run into two pieces and run two separate jobs, it will take 1.9 shorter time than single run on 1 CPU.

    (It means: you loose 20 % of computation speed on 2 CPUs, compared with 2 single runs, but you gain a simple manipulation with output data files - there is no need to merge 2 output binary files.)

  5. Test of MOSIX cluster (http://www.mosix.org). Heterogenous cluster with 8 CPUs, slow 10 Mb LAN; single migrated job (on 1 CPU, Cel/850 MHz), using MVSY integrator.

    mvsy_omf77_mosix0 1624 run on home node, migration forbidden
    mvsy_omf77_mosix1 2030 started elsewhere, migrated to the same
    mvsy_omf77_mosix5 1654 the same run, but very low I/O

    The speed of migrated process is almost the same (98 %), but only for low I/O (luckily, it is a typical case for our runs).

  6. Test of parallel version of MVSY on Mosix.

    mvsy_omf77_mosix2 ? 2 CPUs (Mosix threads)
    mvsy_omf77_mosix4 ? 4 CPUs
    mvsy_omf77_mosix6 5078 2 threads on single CPU!

    It is NOT possible to use parallel version of swift, because Mosix does not support distributed shared memory. => One have to split the calculation to several separate runs (eg. with swiftsplit script).

    (However, authors of Mosix annouce the future support of DSM. Mosix cluster is now still useful for comfortable job management.)

  7. The same runs, but on SGI Power Challenge 32xR10000/300 MHz, OS Irix 6.5, MIPSPro compiler

    rmvsy_mipspro_mat1 1550 1 CPU
    mvsy_mipspro_mat2 1725 2 CPUs
    mvsy_mipspro_mat4 ? 4 CPUs

    The run on 2 CPUs was slower due to overload of the computer.

  8. The same runs on SGI Power Challenge, 10 times higher number of TPs = 500, 10 times shorter timespan 0.02 Myr.

    mvsy_mipspro_hal1 2377 1 CPU
    mvsy_mipspro_hal2 1858 2 CPUs
    mvsy_mipspro_hal4 1228 4 CPUs

    The run-time on 2 CPUs is only (!) 1.3 times shorter. The performance of SMP seem to be very low on this architecture and it is much better to run single jobs.

  9. Test of the NEW IMPLEMENTATION of 2nd ORDER SYMPLECTIC INTEGRATOR SWIFT_MVS2FY (yet without regularization!), timestep = 80 days (ie. 4 times larger, but still more precise than MVS).

    mvs2fy_omf77_ida1 940 1 CPU
    mvs2fy_omf77_ida2 588 2 CPUs
    mvs2fy_omf77_ida1_omp 945 1 CPU, -omp (without OpenMP pragma)

    MVS2FY integrator is approx. 1.7 times faster than MVSY and 2 times faster than RMVSY; one can still enlarge the timestep (upto 100 days?).

  10. Test of total energy error, which depends on an integrator timestep. We integrated Sun - Jupiter - Saturn system for 0.1 Myr, with different timesteps ranging from 0.1 day upto 1000 days.

    Dependence of standard deviation of total energy sigmaE on given timestep dt for integrators SWIFT_MVS2F and SWIFT_MVSF.

    SWIFT_MVS2F seems to be more precise (at least by two orders of magnitude!) in a wide range of timesteps - from 2 upto 200 days.

    Plot relative timestep tau' - relative energy error. (tau' is timestep dt divided by method order (ie. 2), what approximately compensate higher computational expenses of 2nd order integrator MVS2F).

    Usual timestep used with MVS (or RMVS3) integrator is 10 or 20 days, while we can safely use 100 days with MVS2F, keeping the same relative energy error.

Miroslav Broz, miroslav.broz@email.cz, Dec 11th 2001