I/O Performance Optimization Schemes for Manycore HPC Systems

Bang Jiwoo

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

I/O Performance Optimization Schemes for Manycore HPC Systems

DC Field	Value	Language
dc.contributor.advisor	엄현상	-
dc.contributor.author	Bang Jiwoo	-
dc.date.accessioned	2023-11-20T13:25:37Z	-
dc.date.available	2023-11-20T13:25:37Z	-
dc.date.issued	2023-08	-
dc.identifier.other	000000178050	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000178050	-
dc.identifier.uri	https://hdl.handle.net/10371/196498	-
dc.description.abstract	High-performance computing (HPC) systems are composed of thousands of compute nodes, storage systems, and high-speed networks, which provide multiple layers of I/O stacks with high complexity. To meet the increasing demand for data access performance in applications run on HPC systems, an efficient design of the HPC memory management system and storage file system is becoming more important. Moreover, HPC users need to be properly guided with optimal system configuration settings to avoid significant fluctuations in performance. In this dissertation, our first focus is on reducing lock contention on the memory management system of an HPC manycore architecture. One of the critical sections that causes severe lock contention in the I/O path is the page management system, which uses multiple Least Recently Used (LRU) lists with a single lock instance. To solve this problem, we propose the Finer-LRU scheme, which optimizes the page reclamation process by splitting LRU lists into multiple sub-lists, each with its lock instance. Our evaluation result shows that the Finer-LRU scheme can improve sequential write throughput by 57.03% and reduce latency by 98.94% compared to the baseline Linux kernel version 5.2.8 in the Intel Knights Landing (KNL) architecture. We also analyze the root cause of low I/O performance on a ZFS-based Lustre file system and propose a novel ZFS scheme, dynamic-ZFS, which combines two optimization approaches. The experimental results show that our approach can improve the sequential I/O performance by an average of 37%. We demonstrate that dynamic-ZFS can deliver I/O performance comparable to that of ldiskfs-based Lustre while still providing a multitude of beneficial features. Finally, we employ multiple machine learning approaches to perform an in-depth analysis of I/O behaviors in HPC applications and to search for optimal configuration settings for jobs sharing similar I/O characteristics. Improved by a maximum of 0.07 R-squared score, our overall results show that jobs run on HPC systems can obtain the predicted I/O performance for different configuration parameters with high accuracy using the proposed machine learning-based prediction models.	-
dc.description.tableofcontents	Chapter 1 Introduction 1 1.1 Motivation 1 1.1.1 High Performance Computing Systems 1 1.1.2 Problems 2 1.2 Contributions 5 1.3 Outline 7 Chapter 2 Background 8 2.1 Manycore System 8 2.2 Lustre File System 9 2.3 Cori Supercomputer 10 2.4 Related Work 11 Chapter 3 HPC Scalable Memory System Optimization 17 3.1 Motivation 17 3.1.1 I/O Scalability of Existing Manycore Architecture 17 3.1.2 Page Frame Reclamation Process 19 3.1.3 Problem Analysis 21 3.2 Design and Implementation 23 3.2.1 Design of Scalable Locking Mechanism 23 3.2.2 Data Structures 25 3.2.3 Calculation of the LRU list index 26 3.2.4 Customized Callback Functions 27 3.3 Evaluation 31 3.3.1 Experimental Setup 31 3.3.2 I/O Path Latency Evaluation 31 3.3.3 I/O Evaluation with IOR 33 3.3.4 I/O Evaluation with HACC-IO 39 3.3.5 Memory Consumption 40 3.3.6 Optimized Finer-LRU scheme 42 3.4 Discussion 43 3.5 Summary 45 Chapter 4 HPC Storage I/O Stack Optimization 46 4.1 Motivation 46 4.1.1 Lustre Backend File Systems: ldiskfs and ZFS 46 4.1.2 I/O stack of ZFS-based Lustre 49 4.1.3 ZFS I/O Pipeline 50 4.1.4 Problem Analysis 52 4.2 Design and Implementation 56 4.2.1 Parallel Checksum Calculation Pipeline 56 4.2.2 Dynamic Thread Control Scheme 59 4.3 Evaluation 64 4.3.1 Experimental Setup 64 4.3.2 ZFS I/O Pipeline Latency 65 4.3.3 CPU Utilization 66 4.3.4 Dynamic Thread Control 68 4.3.5 Sequential I/O Performance 70 4.3.6 Scalability 72 4.4 Summary 74 Chapter 5 HPC System Configuration Optimization 76 5.1 Motivation 76 5.2 Design and Implementation 79 5.2.1 Dataset Preprocessing 79 5.2.2 Feature Selection and Clustering Models 81 5.2.3 Clustered Datasets 84 5.2.4 Prediction Models 85 5.3 Evaluation 87 5.4 Summary 90 Chapter 6 Conclusion 91 요약 111	-
dc.format.extent	x, 112	-
dc.language.iso	eng	-
dc.publisher	Seoul National University	-
dc.subject	High Performance Computing	-
dc.subject	Manycore Architecture	-
dc.subject	Fine-grained Lock	-
dc.subject	Lustre File System	-
dc.subject	ZFS	-
dc.subject	Unsupervised Learning	-
dc.subject	Prediction Model	-
dc.subject.ddc	621.39	-
dc.title	I/O Performance Optimization Schemes for Manycore HPC Systems	-
dc.type	Dissertation	-
dc.contributor.department	공과대학 컴퓨터공학부	-
dc.description.degree	Doctor	-
dc.date.awarded	2023-08	-
dc.identifier.uci	I804:11032-000000178050	-
dc.identifier.holdings	000000000050▲000000000058▲000000178050▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Files in This Item:

000000178050.pdf 9.02 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share