Tutorial description

The GPU-accelerated POWER node architectures are one of the key architectural systems in the next generation of pre-exascale supercomputers. This tutorial focuses on programming and obtaining performance of applications that will run on the next generation of hybrid accelerator plus Power architectures. The morning lectures will cover an overview of hardware and software compared to other typical HPC installations. Since this architecture is the first of it’s kind, it is important to intimately explore the architectural details that make up this system.
While many of the components are part of an evolutionary technology roadmap, the integration of these using NVLink is new. The tight coupling between CPU + multi-GPU via NVLink is novel in the current ecosystem, and requires new ways to program. At the time of SC16 more details on the technology as well as initial experience on leveraging it for numerical applications will have emerged that can be shared with attendees about this new technology.
These nodes will include multiple GPUs, robust Power based CPUs, smart network connector, all linked together with hardware wide memory bus, so we will demonstrate lessons learned from recent porting efforts at various major centers. Lastly, we will have hands-on exercises covering aspects of the programming environment such as native and open source compilers, profilers and debuggers. This will allow attendees to learn optimization techniques and general tips on developing for accelerated Power based architectures.
The tools used in the hands-on portion of tutorial include:

  • XL / LLVM / GCC compilers: Selection of compilers that are all available on POWER8/POWER8’ systems.
  • perf / PAPI / Score-P: perf and PAPI are tools available in most Linux distros. Score-P is a widely used performance measurement infrastructure, which is developed jointly by multiple academic sites, that is basis for performance tools like Scalasca or Vampir.
  • NVProf: GPU performance profiling tool from NVIDIA.

Audience prerequisites

Participants should have experience in programming numerical applications as well as programming with OpenMP and MPI. Furthermore, they are expected to have basic knowledge of GPU programming using OpenACC. They are expected to provide a laptop with an ssh client pre-installed such that they can login to the provided GPU-accelerated POWER8 nodes.