EARTH on K

Overview

EARTH on K is a derivative version of EARTH(Effective Aggregation Rounds with THrottling) optimization framework towards high performance MPI-IO.
EARTH on K is addressing to achieve high performance MPI-IO using a local file system (LFS) of the K computer.
Compared with the original MPI-IO at the K computer, EARTH on K has the following advanced features;

  • Striping layout aware data aggregation
  • Striping access oriented aggregator layout
  • Throttling I/O requests in accessing OSTs of LFS (Optional)
  • Stepwise data exchanges associated with the throttling in accessing OSTs (Optional)

What is EARTH on K?

2.1. Two-Phase I/O

EARTH on K is an enhanced two-phase I/O optimization customized for the K computer. EARTH on K provides MPI-IO optimized for Tofu interconnects and FEFS.
Fig. 1 shows a typical example of parallel I/O among 4 processes. Non-contiguous access pattern in each process leads to poor performance in I/O. two-phase I/O in collective write case. To improve I/O performance in such case, two phase I/O was proposed in ROMIO as shown in Fig. 2. Every processes are acting as aggregators which play I/O operations. They gather data in data exchange phase and write assigned data to a file.

The semantic view of the PVAS process model
Fig.1 4 processes accessing 2D data array Fig.2 two-phase I/O among 4 processes

2.2. Striping layout aware aggregator layout

  • Aggregator layout has been arranged to suit to striping accesses for OSTs of LFS.
  • This scheme alleviates contention of network and OST accesses.

2.3. EARTH

  • Supported language environment
    • GM-1.2.0-18
    • GM-1.2.0-19
    • GM-1.2.0-20 (Current default version)
  • Installed path at the K computer
    • /opt/aics/earth/earth-1.0_GM-1.2.0-20-1/ (accessible from compute nodes only)

< Original MPI-IO at the K computer vs. EARTH on K >


How to use EARTH on K

  • Compile a program written in MPI and MPI-IO
    • Use existing FUJITSU's cross compiler such as mpifccpx or mpifrtpx
  • Prepare a job script to run a compiled program
    • Three environment variables should be (re-)defined for the EARTH on K. (Details are described below.)
      • PATH
      • LD_LIBRARY_PATH
      • OPAL_PREFIX
  • Submit a job script
#!/bin/bash -x
#PJM --rsc-list "rscgrp=small"
#PJM --rsc-list "node=2x3x4:strict"
#PJM --rsc-list "elapse=HH:MM:SS"
#PJM --stg-transfiles all
#PJM --mpi "use-rankdir"
#PJM --mpi "proc=24"
#PJM --mpi "rank-map-bychip:XYZ"
#PJM --stgin-basedir xxxxxxx
#PJM --stgin "rank=* ./a.out %r:./"
#PJM -s

. /work/system/Env_base
EARTH_ROOT=/opt/aics/earth/earth-1.0_GM-${GMVERSION}
export PATH=${EARTH_ROOT}/bin:${PATH}
export LD_LIBRARY_PATH=${EARTH_ROOT}/lib64:${LD_LIBRARY_PATH}
export OPAL_PREFIX=${EARTH_ROOT}
ENV_FLAG="-x PATH=${PATH} -x LD_LIBRARY_PATH=${LD_LIBRARY_PATH} \ -x OPAL_PREFIX=${OPAL_PREFIX}"
Setup for three environment variables
  • PATH
  • LD_LIBRARY_PATH
  • OPAL_PREFIX
$ mpiexec -x ${ENV_FLAG} -n 24 ./a.out

How to tune I/O operations

  • Environment variables for the EARTH on K.

  • Hint for Throttling Tuning

    EARTH on K provides throttling and stepwise data exchanges for further performance improvements in collective write.

    • FEFS_THROTTLING_REQ: The number of I/O requests in throttling
      * This should be below or equal to the following I/O request upper limit.
    I/O request upper limit = Nz / 2 (K-computer)
    (Nz: The number of nodes in Z-direction in a 3D logical layout.)
    • e.g. 2x3x32(=192 nodes) at the K computer
      I/O request upper limit = 32/2 = 16

      FEFS_THROTTLING_REQ should be between 0 and 16.

Notice for users

  • EARTH on K is built with the help of FUJITSU Limited.
  • EARTH on K is provided as a binary library package only.
  • EARTH on K is available only at the K computer at this moment.