Practical Optimizations for Conjugate Gradient Method Acceleration using CUDA

유동한

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Practical Optimizations for Conjugate Gradient Method Acceleration using CUDA : 쿠다를 이용한 실용적인 켤레기울기법 가속에 관한 연구

DC Field	Value	Language
dc.contributor.advisor	고형석	-
dc.contributor.author	유동한	-
dc.date.accessioned	2017-07-19T08:42:45Z	-
dc.date.available	2017-07-19T08:42:45Z	-
dc.date.issued	2016-08	-
dc.identifier.other	000000136008	-
dc.identifier.uri	https://hdl.handle.net/10371/131255	-
dc.description	학위논문 (석사)-- 서울대학교 대학원 : 협동과정 계산과학전공, 2016. 8. 고형석.	-
dc.description.abstract	This dissertation presents a series of optimizations for preconditioned and nonpreconditioned the Conjugate Gradient(henceforth, CG) method using CUDA. Each lines of CG algorithm has data dependency on adjacent lines but each step is parallelizable operation like matrix-vector multiplication, dot product, and axpy operation. Because each step is well-known parallelizable operation, overall CG algorithm speed can be accelerated by GPUs and meaningful speedup can be seen with the optimization methods presented in this dissertation. First, we describe performance issues from na¨ıve version of CUDA based CG implemented using an widely adopted CUDA library package: cuBLAS. This library provides generic low level algorithms that can be useful to implement high level algorithms without being focused on writing performant CUDA kernels. However, device-host synchronizations limit the performance gains from CUDA acceleration due to the data dependency of conjugate gradient algorithm steps if that is implemented without a care. GPUs could be i severely under-utilized between each step and GPUs cannot be run at full speed. We proposed a simple but practical optimization technique to avoid device and host synchronizations: Lazy residual evaluation. In this thesis, the overall runtime performance gain by eliminating devicehost synchronizations are explained one by one as the number of synchronizations per iteration is reduced. In the meantime, the changes on CPU and GPU pipeline are explained with illustration as well. Then, the performance gain from the proposed method, Lazy residual evaluation, and advantages or disadvantages are compared against other backend implementations with different level of device-host synchronizations. Finally, importance of device and host synchronization minimization is expressed in details when accelerating iterative algorithms similar to CG using GPUs.	-
dc.description.tableofcontents	Chapter 1. Introduction 1 1.1 Iterative Solvers 1 1.2 The Conjugate Gradient Method 2 1.3 CUDA 4 1.4 Performance Issues with the Na¨ıve CUDA/cuBLAS CG 5 1.5 Optimization Methods for CUDA/cuBLAS CG 6 Chapter 2. Previous Work 8 Chapter 3. Background 11 3.1 cuBLAS Device Pointer Mode 13 3.2 CUDA Dynamic Parallelism 15 Chapter 4. Fully Asynchronous CG based on the Dynamic Parallelism 17 Chapter 5. Lazy Resiaul Evaluation 19 5.1 The Modified Secant Method 21 Chapter 6. Experiment Results 24 6.1 Testing Setup 24 6.2 Experiment Results of onepiece 256k 29 6.3 Experiment Results of onepiece 118k 32 6.4 Experiment Results of onepiece 42k 35 6.5 Experiment Results of onepiece 10k 38 Chapter 7. Conclusion 41 Bibliography 44 초 록 46	-
dc.format	application/pdf	-
dc.format.extent	4079770 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	lazy residual evalution	-
dc.subject	conjugage gradient method	-
dc.subject	CUDA	-
dc.subject.ddc	004	-
dc.title	Practical Optimizations for Conjugate Gradient Method Acceleration using CUDA	-
dc.title.alternative	쿠다를 이용한 실용적인 켤레기울기법 가속에 관한 연구	-
dc.type	Thesis	-
dc.description.degree	Master	-
dc.citation.pages	47	-
dc.contributor.affiliation	자연과학대학 협동과정 계산과학전공	-
dc.date.awarded	2016-08	-

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Program in Computational Science and Technology (협동과정-계산과학전공)
  - Theses (Master's Degree_협동과정-계산과학전공)

Files in This Item:

000000136008.pdf 3.89 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share