ARTÍCULO
TITULO

Flattening of Data-Dependent Nested Loops for Compile-Time Optimization of GPU Programs

Vadim Bulavintsev    

Resumen

Modern Graphics Processing Units (GPUs) belong to the ?Single Instruction Multiple Data? (SIMD) computational architecture class. Due to inefficient execution of divergent branches, SIMD devices can lose performance on nested loops with data-dependent exit conditions. A specialized compile-time Control Flow Graph (CFG) transformation routine can solve this problem. The routine reduces loop nest level by merging the inner loop with the outer loop. The transformed program remains logically equivalent to the original one, while its branching pattern becomes better suited for execution on a SIMD device. The routine is implemented as a Low-Level Virtual Machine (LLVM) Transformation Pass. Depending on the dataset and nested loops parameters, the transformation reduces the worst-case running time of a specialized GPU benchmarking application up to 24 times.

 Artículos similares