dot
Detailansicht
Katalogkarte GBA
Katalogkarte ISBD
Suche präzisieren
Drucken
Download RIS
Hier klicken, um den Treffer aus der Auswahl zu entfernen
Titel Efficient 3D stencil computations and geometric multigrid on modern GPUs
VerfasserIn Marcin Krotkiewski, Marcin Dabrowski
Konferenz EGU General Assembly 2011
Medientyp Artikel
Sprache Englisch
Digitales Dokument PDF
Erschienen In: GRA - Volume 13 (2011)
Datensatznummer 250051649
 
Zusammenfassung
Graphics Processing Units (GPUs) are well suited for solving Partial Differential Equations using structured Cartesian grids. In many applications the discretized differential operator can be computed in every grid point as a stencil. We present efficient implementations of the three-dimensional 7-point and 27-point stencils on modern high-end Nvidia GPUs using the CUDA environment. A new method of reading the data required to compute the stencil from the global GPU memory to threads’ local memory is presented. The implementation effectively uses two levels of cache available on the tested GPUs: the shared memory, and the texture memory. Moreover, in the presented approach the execution path is the same for all the threads, which removes the necessity for conditional statements and significantly improves the performance. We demonstrate that the stencil computation is memory bounded on the GPUs, i.e. the speed of the GPU memory is expected to be the performance bottleneck, and not the FLOPs performance. Consequently, our performance analysis concentrates on the memory bandwidth. We show that for the 7-point stencil the presented implementation is optimal on Tesla 1060, i.e. it utilizes close to 100% of the available memory bandwidth. The general 27-point stencil utilizes 65% of the memory bandwidth on a Tesla 2050, yielding a performance of 5.7 billion points per second and 300 GFLOPs. We use the optimized stencil implementation in a Jacobi relaxation scheme and as a building-block of an efficient Multigrid solver for the Poisson problem on structured, regular grids. Since the implementation is memory bandwidth bounded the time required to apply the the 7-point and 27-point stencils is almost the same – although the number of arithmetic operations differs, the amount of data that needs to be processed in both cases is the same. Thus, without a loss of computational speed we can use the Finite Elements discretization of the Poisson equation. In this case, for 257^3 grid points and single precision floating point numbers our implementation computes 40 V-cycles per second on a Tesla 2050. The developed Multigrid solver is used as the pressure solver and the temperature diffusion solver in the simulations of natural convection in a homogeneous porous medium saturated with incompressible single-phase fluid. One time step of the simulation on Tesla 2050 using a 513x513x128 grid takes 0.7 secons.