Publications

Detailed Information

I/O Performance Optimization Schemes for Manycore HPC Systems

DC Field Value Language
dc.contributor.advisor엄현상-
dc.contributor.authorBang Jiwoo-
dc.date.accessioned2023-11-20T13:25:37Z-
dc.date.available2023-11-20T13:25:37Z-
dc.date.issued2023-08-
dc.identifier.other000000178050-
dc.identifier.urihttps://dcollection.snu.ac.kr/common/orgView/000000178050-
dc.identifier.urihttps://hdl.handle.net/10371/196498-
dc.description.abstractHigh-performance computing (HPC) systems are composed of thousands of compute nodes, storage systems, and high-speed networks, which provide multiple layers of I/O stacks with high complexity. To meet the increasing demand for data access performance in applications run on HPC systems, an efficient design of the HPC memory management system and storage file system is becoming more important. Moreover, HPC users need to be properly guided with optimal system configuration settings to avoid significant fluctuations in performance.

In this dissertation, our first focus is on reducing lock contention on the memory management system of an HPC manycore architecture. One of the critical sections that causes severe lock contention in the I/O path is the page management system, which uses multiple Least Recently Used (LRU) lists with a single lock instance. To solve this problem, we propose the Finer-LRU scheme, which optimizes the page reclamation process by splitting LRU lists into multiple sub-lists, each with its lock instance. Our evaluation result shows that the Finer-LRU scheme can improve sequential write throughput by 57.03% and reduce latency by 98.94% compared to the baseline Linux kernel version 5.2.8 in the Intel Knights Landing (KNL) architecture.

We also analyze the root cause of low I/O performance on a ZFS-based Lustre file system and propose a novel ZFS scheme, dynamic-ZFS, which combines two optimization approaches. The experimental results show that our approach can improve the sequential I/O performance by an average of 37%. We demonstrate that dynamic-ZFS can deliver I/O performance comparable to that of ldiskfs-based Lustre while still providing a multitude of beneficial features.

Finally, we employ multiple machine learning approaches to perform an in-depth analysis of I/O behaviors in HPC applications and to search for optimal configuration settings for jobs sharing similar I/O characteristics. Improved by a maximum of 0.07 R-squared score, our overall results show that jobs run on HPC systems can obtain the predicted I/O performance for different configuration parameters with high accuracy using the proposed machine learning-based prediction models.
-
dc.description.tableofcontentsChapter 1 Introduction 1
1.1 Motivation 1
1.1.1 High Performance Computing Systems 1
1.1.2 Problems 2
1.2 Contributions 5
1.3 Outline 7
Chapter 2 Background 8
2.1 Manycore System 8
2.2 Lustre File System 9
2.3 Cori Supercomputer 10
2.4 Related Work 11
Chapter 3 HPC Scalable Memory System Optimization 17
3.1 Motivation 17
3.1.1 I/O Scalability of Existing Manycore Architecture 17
3.1.2 Page Frame Reclamation Process 19
3.1.3 Problem Analysis 21
3.2 Design and Implementation 23
3.2.1 Design of Scalable Locking Mechanism 23
3.2.2 Data Structures 25
3.2.3 Calculation of the LRU list index 26
3.2.4 Customized Callback Functions 27
3.3 Evaluation 31
3.3.1 Experimental Setup 31
3.3.2 I/O Path Latency Evaluation 31
3.3.3 I/O Evaluation with IOR 33
3.3.4 I/O Evaluation with HACC-IO 39
3.3.5 Memory Consumption 40
3.3.6 Optimized Finer-LRU scheme 42
3.4 Discussion 43
3.5 Summary 45
Chapter 4 HPC Storage I/O Stack Optimization 46
4.1 Motivation 46
4.1.1 Lustre Backend File Systems: ldiskfs and ZFS 46
4.1.2 I/O stack of ZFS-based Lustre 49
4.1.3 ZFS I/O Pipeline 50
4.1.4 Problem Analysis 52
4.2 Design and Implementation 56
4.2.1 Parallel Checksum Calculation Pipeline 56
4.2.2 Dynamic Thread Control Scheme 59
4.3 Evaluation 64
4.3.1 Experimental Setup 64
4.3.2 ZFS I/O Pipeline Latency 65
4.3.3 CPU Utilization 66
4.3.4 Dynamic Thread Control 68
4.3.5 Sequential I/O Performance 70
4.3.6 Scalability 72
4.4 Summary 74
Chapter 5 HPC System Configuration Optimization 76
5.1 Motivation 76
5.2 Design and Implementation 79
5.2.1 Dataset Preprocessing 79
5.2.2 Feature Selection and Clustering Models 81
5.2.3 Clustered Datasets 84
5.2.4 Prediction Models 85
5.3 Evaluation 87
5.4 Summary 90
Chapter 6 Conclusion 91
요약 111
-
dc.format.extentx, 112-
dc.language.isoeng-
dc.publisherSeoul National University-
dc.subjectHigh Performance Computing-
dc.subjectManycore Architecture-
dc.subjectFine-grained Lock-
dc.subjectLustre File System-
dc.subjectZFS-
dc.subjectUnsupervised Learning-
dc.subjectPrediction Model-
dc.subject.ddc621.39-
dc.titleI/O Performance Optimization Schemes for Manycore HPC Systems-
dc.typeDissertation-
dc.contributor.department공과대학 컴퓨터공학부-
dc.description.degreeDoctor-
dc.date.awarded2023-08-
dc.identifier.uciI804:11032-000000178050-
dc.identifier.holdings000000000050▲000000000058▲000000178050▲-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share