Profile-Guided Loop Parallelization and Co-Scheduling on Gpu-Based Heterogeneous Many-Core Architectures

By:

Guodong Han

Format:

Hardback

Show Hide Adult Content

Unavailable

Sorry, this product is not currently available to order

Description

This dissertation, "Profile-guided Loop Parallelization and Co-scheduling on GPU-based Heterogeneous Many-core Architectures" by Guodong, Han, 韩国栋, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: The GPU-based heterogeneous architectures (e.g., Tianhe-1A, Nebulae), composing multi-core CPU and GPU, have drawn increasing adoptions and are becoming the norm of supercomputing as they are cost-effective and power-efficient.However, programming such heterogeneous architectures still requires significant effort from application developers using sophisticated GPU programming languages such as CUDA and OpenCL. Although some automatic parallelization tools utilizing static analysis could ease the programming efforts, this approach could only parallelize loops 100% free of inter-iteration dependency (i.e., determined DO-ALL loops) because of imprecision of static analysis. To exploit the abundant runtime parallelism and take full advantage of the computing resources both in CPU and GPU, in this work, we propose a new user-friendly compiler framework and runtime system, which helps Java applications harness the full power of a heterogeneous system. It unveils an all-round system design unifying the programming style and language for transparent use of both CPUs and GPUs, automatically parallelizing all kinds of loops, scheduling workloads efficiently across CPU and GPU resources while ensuring data coherence during highly-threaded execution. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. To guide the runtime task scheduling, we develop a novel dynamic loop profiler which generates the program dependency graph (PDG) and computes the density of dependencies across iterations through a hybrid checking scheme combining intra-warp and inter-warp analyses. Implementing a GPU-tailored thread-level speculation (TLS) model, our system supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU.We have carried out several experiments to evaluate the profiling overhead and up to 11 real-life applications to evaluate our system performance. Testing results show that the overhead is moderate compared with the sequential execution and prove that almost all the applications could benefit from our system. DOI: 10.5353/th_b5053425 Subjects: Graphics processing unitsParallel processing (Electronic computers)Computer architecture

Release date NZ

January 26th, 2017

Author

Guodong Han

Contributor

Created by

Edition

annotated edition

Audience

General (US: Trade)

Illustrations

colour illustrations

Publisher

Open Dissertation Press

Country of Publication

United States

Imprint

Open Dissertation Press

Dimensions

216x279x8

ISBN-13

9781361320440

Product ID

26644550

Customer reviews

Nobody has reviewed this product yet. You could be the first!

Write a Review

Marketplace listings

There are no Marketplace listings available for this product currently.
Already own it? Create a free listing and pay just 9% commission when it sells!

Sell Yours Here

Get free delivery, exclusive deals, and more^*

Computers & Internet Books:

Profile-Guided Loop Parallelization and Co-Scheduling on Gpu-Based Heterogeneous Many-Core Architectures

By:

Format:

Description

Customer reviews

Marketplace listings

Help & options

Filed under...

Get free delivery, exclusive deals, and more*

Get free delivery, exclusive deals, and more^*