Publications

Detailed Information

Performance Modeling, Performance Tuning and Quantization for GPU Programs : GPU 프로그램을위한 성능 모델링, 성능 튜닝 및 양자화

DC Field Value Language
dc.contributor.advisor이재진-
dc.contributor.author다오툰탄-
dc.date.accessioned2022-04-20T02:48:51Z-
dc.date.available2022-04-20T02:48:51Z-
dc.date.issued2021-
dc.identifier.other000000168084-
dc.identifier.urihttps://hdl.handle.net/10371/178211-
dc.identifier.urihttps://dcollection.snu.ac.kr/common/orgView/000000168084ko_KR
dc.description학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·컴퓨터공학부, 2021.8. 이재진.-
dc.description.abstractGPUs have played an important role in solving many scientific problems that range across different domains. Writing GPU programs might be easy, but writing them efficiently is much more difficult. To achieve the best performance, it is necessary that the compiler and runtime have advanced techniques to compile and run the program efficiently. These techniques should be transparent to the programmers and help them avoid the burden of having to know many details of the underlying architecture. Among the most important aspects that help improve the performance of a GPU program, we focus on the problem of performance modeling, performance tuning and quantization. Performance modeling estimates the execution time of the program and can be useful in analyzing the program characteristics or partitioning the workload in a heterogenous system. Performance tuning finds the optimal solution from an optimization space in a reasonable time. Quantization reduces the precision needed to execute the program without losing significant output accuracy. The proposed techniques can be integrated into GPU compilers and runtimes to help them be more
efficient.
-
dc.description.tableofcontents1 Introduction 1
1.1 Introduction 1
2 Performance Modeling 4
2.1 Introduction 4
2.2 Related Work8
2.3 Background 10
2.3.1 OpenCL Framework 10
2.3.2 GPU Architecture 11
2.3.3 Support Vector Regression.14
2.4 Prerequisites to efficient profiling: An insight to warp scheduling 16
2.5 Performance Estimation.23
2.5.1 Linear Model 24
2.5.2 Model based on Machine Learning 25
2.6 Evaluation 29
2.6.1 Evaluation Setup 29
2.6.2 Performance estimation results. 30
2.6.3 The ML-based model on different classes of kernels 37
2.6.4 The performance at different saturation points. 37
2.7 Conclusions 39
3 Performance Auto-tuning 41
3.1 Introduction 42
3.2 Related Work45
3.3 OpenCL and GPU Architectures 47
3.4 Effects of the Work-group Size 49
3.4.1 Occupancy50
3.4.2 Global Memory Coalescing 51
3.4.3 Cache Contention 56
3.4.4 Amount of Work.57
3.4.5 Work-group Scheduling and Barriers 58
3.4.6 Benchmark Applications 59
3.5 Auto-tuning Work-group Size.61
3.5.1 Workload Tuner.62
3.5.2 Non-coalescing Factor Tuner 64
3.5.3 Concurrency Tuner 66
3.5.4 Exhaustive-search Tuner 70
3.6 Evaluation 70
3.6.1 Overall Tuning Quality 70
3.6.2 Overall Tuning Cost 75
3.6.3 Effect of the Workload Tuner 76
3.6.4 Effect of the Non-coalescing Factor Tuner 77
3.6.5 Effect of the Concurrency Tuner 77
3.7 Conclusions 79
4 Quantization for Deep Learning Programs 80
4.1 Introduction 81
4.2 Related Work83
4.3 Background 85
4.3.1 Integer Quantization 85
4.3.2 Standard Techniques Used 87
4.4 Quantization Framework.88
4.4.1 Inference Phase 88
4.4.2 Training Phase 89
4.4.3 Adding Noise to the Scale 89
4.4.4 Adaptively Adjusting Precisions 93
4.4.5 Computation of Histogram.97
4.5 Experiments 97
4.5.1 Image Classification Tasks 100
4.5.2 Natural Language Processing 105
4.6 Conclusions 106
5 Conculsion 107
Acknowledgements 123
-
dc.format.extentix, 133-
dc.language.isoeng-
dc.publisher서울대학교 대학원-
dc.subjectPerformance Modeling-
dc.subjectPerformance Tuning-
dc.subjectPerformance Analysis-
dc.subjectGPU-
dc.subjectDeep Learning-
dc.subjectQuantization-
dc.subject.ddc621.3-
dc.titlePerformance Modeling, Performance Tuning and Quantization for GPU Programs-
dc.title.alternativeGPU 프로그램을위한 성능 모델링, 성능 튜닝 및 양자화-
dc.typeThesis-
dc.typeDissertation-
dc.contributor.AlternativeAuthorThanh Tuan Dao-
dc.contributor.department공과대학 전기·컴퓨터공학부-
dc.description.degree박사-
dc.date.awarded2021-08-
dc.identifier.uciI804:11032-000000168084-
dc.identifier.holdings000000000046▲000000000053▲000000168084▲-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share