CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications