Google Scholar

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

J Gong, S Markidis, E Laure, M Otten, P Fischer… - The Journal of …, 2016 - Springer

J Gong, S Markidis, E Laure, M Otten, P Fischer, M Min

The Journal of Supercomputing, 2016•Springer

We present a hybrid GPU implementation and performance analysis of Nekbone, which
represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000.
The implementation is based on OpenACC and CUDA Fortran for local parallelization of the
compute-intensive matrix–matrix multiplication part, which significantly minimizes the
modification of the existing CPU code while extending the simulation capability of the code
to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating …

Abstract

We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier–Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix–matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather–scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

Springer

Show moreShow less

Save Cite Cited by 39 Related articles All 9 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations