Partitioned Virtual Address Space (PVAS)

Background

  • Many-core processors containing a large number of small cores are gathering attention.
    • Intra-node parallelization is important for systems containing many-core processors.
    • Programmers implement intra-node parallelization by using task models provided by the OS.
  • Issues of the conventional task models
    • They are originally designed for time-shared multi-tasking, so they are not suitable for parallel processing on many-core systems in several respects.

The semantic view of the PVAS process model


Partitioned Virtual Address Space (PVAS)

  • A new task model promoting the performance of the applications running on many-core systems
    • Cherry-picking of the multi-process and the multi-thread
  • Address space design of PVAS
    • PVAS partitions a virtual address space and assigns one partition to one task.
  • Intra-node communication of PVAS
    • Tasks can directly exchange the data for parallel computation with each other because there are no address space boundaries between the tasks.
    • High-performance intra-node communication can be achieved.
    • sample.c
The semantic view of the PVAS process model
  • VM management of PVAS
    • PVAS manages each partition respectively and the task uses only its own partition basically.
    • The serialization of the VM operations can be avoidable.
The semantic view of the PVAS process model

The semantic view of the PVAS process model


Preliminary Evaluation

  • Evaluation environment
    • PVAS was implemented on the top of Linux kernel.
    • Intel Xeon 2.93 GHz, 6 cores x 2 sockets
  • Intra-node communication
    • Nemesis which is low-level communication layer of MPICH2 was modified to leverage PVAS intra-node communication.
    • Original Nemesis uses shared memory on intra-node communication, it results in two memory copies.
    • Modified Nemesis directly copies the data from sender-buffer to receiver-buffer.
    • Ping-pong communication using PVAS implementation shows best performance when message size is larger than 4KB (Fig.3).
The semantic view of the PVAS process model
  • VM operation
    • VM operation benchmark was evaluated by using hand-made micro benchmark.
    • In this benchmark, multiple tasks execute mmap, memset and munmap operations 1000 times over parallel degrees.
    • PVAS and multi-process implementation show good performance even if the number of tasks increases (Fig.4).
The semantic view of the PVAS process model