File Coordination Library

I/O Coordination

Introduction

  • I/O system resources such as MDS, OSS/OST are shared across the processes of a single application and/or multiple simultaneous programs.
  • More and more HPC systems become large, the amount of I/O requests to each I/O system resource becomes larger, and load of them becomes heavier.
  • The heavy loads of I/O system resources cause I/O performance degradation, and application performance degradation.
  • It is one of the scalability issues of leadership-class high performance computing systems.
  • We target the case that each process of the parallel application creates each own file and writes the file each.
    • Many parallel application adopt the I/O pattern.
  • The performance issue of the parallel I/O on current parallel file systems is shown.
  • The I/O coordination technique to mitigate the issue is proposed.

Parallel I/O performance of current parallel file systems

Evaluation Environment

  • A part of the K Computer
    • Computing Node : 2 x 3 x 32 (192 Node)
    • File System
      • MDS : 1 (shared with other jobs)
      • OSS : 6,OST : 12 [ 2 OST / OSS ]
        ( not shared with other running jobs, but shared with file staging)

I/O Performance [Striping]

  • File striping causes performance degradation.
    • Because the number of write processes accessing the each OST is larger.

I/O Performance [write size]

fc02 fc02 fc02
fc02 fc02 fc02
【Common Parameters】
- Striping Count 1 ( not striping )
- Striping Offset the OST of same Z-axis

Summary

  • In the case that the number of write-processes is small ( 1-4 writers / OST ) , larger stripe size gets better I/O performance.
  • If the number of write-processes becomes larger, the performance of OST decreases.

I/O Coordination

  • To minimize the performance decrements caused by I/O contentions on the storage server, we propose I/O Coordination Techniques.
  • And we developed the I/O Coordination Library(IOC) for FEFS of the K computer.
  • Coordinator processes are adopted between computing processes and OSS, and the coordinators are assigned to each OST one by one.
  • Coordinator processes corresponding with each OST coordinate write operations to the storages.
fc04

Sequence of Write Operation

  • Each computing process sends request to coordinator process.
  • Coordinator process gives a right of write operation to some processes.
  • The process given a right progresses a write operation. Other Processes wait.
  • After the process finished write operation, sends a completion message to coordinator process.
  • Coordinator process gives a right of write operation to waiting process.
fc04

Evaluation Environment

  • Computing Node : 2 x 3 x 32 (192 Node)
fc04

The Elapsed Time of File Output

  • Settings
    • 192 Computing Nodes , 12 OSTs
    • File Size is 1GB
fc04 fc04

The Basic Performance of IOC

fc02 fc02 fc02
fc02 fc02 fc02
【IOC Parameters】
- stripe size 16MB
- the number of parallel writers of each OST 2 processes

Application Benchmark

  • NICAM : Nonhydrostatic Icosahedral Atmospheric Model
    • under development in cooperation with CCSR and JAMSTEC
  • File output of NICAM
    • Each process of NICAM creates 2 files.
      • History file : Result data of simulation
      • Restart file : Check point data
    • Format of these files is PaNDa (packaged NICAM data format).
      • Binary data format, and data layout is below.
fc04

Environment and Configuration

  • We executed NICAM with 2 configurations.
    • g-Level 10 ( resolution : 7km grid)
      • It uses 320 nodes of the K computer.
    • og-Level 11 ( resolution : 3.5 km grid)
      • It uses 1280 nodes of the K computer.
    • All processes compute the same number of grids.
fc04

Result

Output Time (320Nodes)  
fc04
Output Time (1280Nodes)  
fc04

Summary

  • We show the performance issue of the parallel I/O on current parallel file systems.
    • The number of processes accessing OST becomes larger, the workload of OST becomes larger. So the I/O performance of OST becomes less.
  • We propose the I/O coordination technique to mitigate the issue.
    • Coordinator processes coordinate I/O requests from computing nodes, and control the amount of I/O requests the OST have to handle at the same time.
  • The performance evaluation showed better performance than the original I/O.