Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)

Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)

Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, Wen-mei W. Hwu
ISBN: 9781608459544 | PDF ISBN: 9781608459551
Copyright © 2012 | 96 Pages | Publication Date: 01/01/2012

BEFORE YOU ORDER: You may have Academic or Corporate access to this title. Click here to find out: 10.2200/S00451ED1V01Y201209CAC020

Ordering Options: Paperback $30.00   E-book $24.00   Paperback & E-book Combo $37.50

Why pay full price? Members receive 15% off all orders.
Learn More Here

Read Our Digital Content License Agreement (pop-up)

Purchasing Options:

General-purpose graphics processing units (GPGPU) have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high-end supercomputers to embedded mobile platforms. Relative to more traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading (hundreds of hardware thread contexts vs. tens), a return to wide vector units (several tens vs. 1-10), memory architectures that deliver higher peak memory bandwidth (hundreds of gigabytes per second vs. tens), and smaller caches/scratchpad memories (less than 1 megabyte vs. 1-10 megabytes).

In this book, we provide a high-level overview of current GPGPU architectures and programming models. We review the principles that are used in previous shared memory parallel platforms, focusing on recent results in both the theory and practice of parallel algorithms, and suggest a connection to GPGPU platforms. We aim to provide hints to architects about understanding algorithm aspect to GPGPU. We also provide detailed performance analysis and guide optimizations from high-level algorithms to low-level instruction level optimizations. As a case study, we use n-body particle simulations known as the fast multipole method (FMM) as an example. We also briefly survey the state-of-the-art in GPU performance analysis tools and techniques.

Table of Contents

GPU Design, Programming, and Trends
Performance Principles
From Principles to Practice: Analysis and Tuning
Using Detailed Performance Analysis to Guide Optimization

About the Author(s)

Hyesoon Kim, Georgia Institute of Technology
Hyesoon Kim is an Assistant professor in the School of Computer Science at Georgia Institute of Technology.Her research interests include high-performance energy-efficient heterogeneous architectures, programmer-compiler = microarchitecture interaction and developing tools to help parallel programming. She received a B.A. in mechanical engineering from Korea Advanced Institute of Science and Technology (KAIST), an M.S. in mechanical engineering from Seoul National University, and an M.S. and a Ph.D. in computer engineering at The University of Texas at Austin. She is a recipient of the NSF career award in 2011.

Richard Vuduc, Georgia Institute of Technology
Richard (Rich) Vuduc is an assistant professor in the School of Computational Science and Engineering at the Georgia Institute of Technology. His research lab, The HPC Garage, is interested in high-performance computing, with an emphasis on parallel algorithms, performance analysis, and performance tuning. His lab's work has been recognized by numerous best paper awards and his lab was part of the team that won the 2010 Gordon Bell Prize, supercomputing's highest performance achievement award. He is a recipient of the National Science Foundation's CAREER Award (2010) and has served as a member of the Defense Advanced Research Projects Agency's Computer Science Study Group (2009). Rich received his Ph.D. from the University of California, Berkeley, and was a postdoctoral scholar at Lawrence Livermore National Laboratory.

Sara Baghsorkhi, Intel Corporation
Sara S. Baghsorkhi is a research scientist in the Programming System Lab at Intel, Santa Clara. She received her Computer Science from University of Illinois at Urbana Champaign.Her primary areas of research include auto-tuning and code generation for high performance computer architectures with a focus on wide vector SIMD designs. She has published 10 research papers and has 7 patents.

Jee Choi, Georgia Institute of Technology
Jee W. Choi is a 5th year Ph.D. student in the school of Electrical and Computer Engineering at Georgia Institute of Technology. His research interests include modeling for performance and power for multi-core, accelerators and heterogeneous systems. Jee received his B.S. and M.S. from Georgia Institute of Technology.

Wen-mei W. Hwu, University of Illinois at Urbana-Champaign
Wen-mei W. Hwu is Sanders-AMD Endowed Chair Professor at the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interest is in the area of architecture, compilation, and programming techniques for high performance, energy-efficient computer systems. He is well known for the development of the IMPACT compiler technology for predicated execution and speculative execution, which widely used in DSP and GPU cores and compilers today. He is the chief scientist of Parallel Computing Institute and a Co-PI of the $208M NSF Blue Waters Supercomputer Project. He is a co-founder and CTO of MulticoreWare.For his contributions to compiler optimization and computer architecture, he received the 1993 Eta Kappa Nu Outstanding Young Electrical Engineer Award, 1994 University Scholar Award of the University of Illinois, 1998 ACM SigArch Maurice Wilkes Award, 1999 ACM Grace Murray Hopper Award, ISCA Influential Paper Award, and Distinguished Alumni Award in Computer Science of the University of California, Berkeley. Dr. Hwu has also been in the forefront of computer engineering education. He and David Kirk jointly created an undergraduate heterogeneous parallel programming course at the University of Illinois (ECE498AL-Programming Parallel Processors), which has become ECE408/CS483 - Applied Parallel Programming. The course has been adopted by many universities. Hwu and Kirk have been offering summer school versions of this course worldwide. In 2010, Kirk and Hwu published the textbook for the course, entitled “Programming Massively Parallel Processors - A Hands-on Approach,” by Elsevier. As of 2012, more than 12,000 copies have been sold. For his teaching and contribution to education, he has received the 1997 Eta Kappa Nu Holmes MacDonald Outstanding Teaching Award and the 2002 ECE Teaching Excellence Award. He is a fellow of IEEE and ACM. Dr. Hwu received his Ph.D. in Computer Science from the University of California, Berkeley.


Customers who bought this product also purchased
Datacenter Design and Management
Datacenter Design and Management
Browse by Subject
ACM Books
IOP Concise Physics
0 items

Note: Registered customers go to: Your Account to subscribe.

E-Mail Address:

Your Name: