File Composition Library

Development of File Composition Library

Background

  • Many parallel applications employ the file access pattern that each process creates individual files and writes data to own files.
  • Such file access pattern becomes one of the scalability issues in parallel applications.

    -As the number of processes increases, the number of files increases.
    -Metadata management due to the increasing number of files becomes large, and operations concentrate on metadata server.
fc01

File Composition Library

  • File Composition Library(FCLib) aggregates each file, and stores them to one or a few files.

    -It can reduce the number of files.
    -Metadata server load becomes less.
  • Features of File Composition Library

    -API of FCLib follows POSIX.
    -FCLib optimizes I/O to parallel file systems.
fc02 fc03


  • API of FCLib follows POSIX.

    Existing libraries has less flexibility for application programmers.

    -Programs must be written in the SPMD manner.
    -Library can only handle well-structured data.
    -Programmers must manage data layout on the aggregate file.
fc04


  • FCLib optimizes I/O to parallel file systems.

    -Exclusive access to stripe blocks
    -Buffered I/O
fc05

Preliminary Evaluation

  • The elapsed time that each process creates one file on FEFS (Lustre-based) of the K computer.
fc06

Summary

File Composition Library is introduced.

  • FCLib can mitigate the bottleneck of metadata operations caused by file access patterns that.
  • FCLib is very flexible for application programmers.
    •API follows POSIX.
  • The preliminary evaluation showed better performance than the original POSIX I/O.