Publications

Detailed Information

Design of Efficient and Fast Distributed Storage System for Low Latency Storage Devices : 고성능 스토리지를 고려한 효율적이고 최적화된 분산 스토리지 시스템 디자인

DC Field Value Language
dc.contributor.advisor염헌영-
dc.contributor.author오명원-
dc.date.accessioned2020-10-13T02:57:42Z-
dc.date.available2020-10-13T02:57:42Z-
dc.date.issued2020-
dc.identifier.other000000162093-
dc.identifier.urihttps://hdl.handle.net/10371/169346-
dc.identifier.urihttp://dcollection.snu.ac.kr/common/orgView/000000162093ko_KR
dc.description학위논문 (박사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2020. 8. 염헌영.-
dc.description.abstractDistributed storage systems can uphold balanced data growth in terms of capacity and performance
on an on-demand basis. However, existing distributed storage systems are designed for targeting hard disk drives and this design limitation causes significant performance degradation when they are used with NVMe devices. Also, it is a challenge to store and manage large sets of contents being generated by the
explosion of data.
In this dissertation, we propose the efficient design of distributed storage system which can provide
high performance depending on consistency model and data reduction scheme based on tiering.
First, we introduce CPU-efficient I/O processing design:
1) decoupling latency-critical job and best-effort job,
2) partition to avoid contention and maximize parallelism.
Second, we present a new deduplication method based on tiering, which is highly scalable and compatible with the existing scale-out distributed storage in order to remove redundant data across many nodes.

To show the effectiveness of our approach, we implement and evaluate our schemes on real distributed storage systems. The experimental results demonstrate that our cooperative approach can deliver higher performance than existing approaches while saving a considerable amount of storage space.
-
dc.description.abstract분산 스토리지 시스템은 최근 데이터 증가와 높은 I/O 성능 요구로 인해
각광받고 있다. 하지만 기존 분산 스토리지 시스템은 하드 디스크의 I/O 처리를 호환하기
위한 I/O 처리구조를 지니고 있기 때문에 NVMe 스토리지 같은 빠른 디바이스와 사용될때 스토리지 성능을 모두 내지 못한다는 문제가 있다. 더불어 데이터의 폭증으로 인한 많은 양의
데이터는 이를 저장하고자 하는 분산 스토리지 시스템에서 또 다른 문제를 야기하고 있다.

본 논문에서는 플래시 기반의 디바이스를 위한 분산 스토리지 시스템 구조에 따른 성능 최적화 방법과 데이터 저장을 최소화 할 수 있는 기법을 종합한 티어링 구조의 분산 스토리지 시스템 구조를 제안하고자 한다. 이를 위해, 첫째, CPU 효율적인 I/O 처리 방법을 소개한다. 해당 기법에는 I/O 작업을 레이턴시에 민감한 작업과 그렇지 않은 작업으로
나누어 다른 처리를 한다. 둘째, 티어링 기반에 중복제거 기법을 소개한다.
기존 분산 스토리지 시스템에 적용가능하면서 확장 가능한 구조 제안을 통해 효과적으로
데이터 노드간 중복되는 데이터를 제거하는 방법을 다룬다.

본 논문에서는 제안한 방법들의 실효성을 확인하기 위하여 실제 분산 스토리지 시스템에 구현한 후 그 기능을 평가하였다. 실험 결과를 통해 제안한 기법이 기존 방법보다 성능을 향상되고 저장되는 데이터양은 상당히 줄일 수 있음을 보였다.
-
dc.description.tableofcontentsChapter 1 Introduction 1
1.1 Motivation 1
1.1.1 Problem and key idea 4
1.2 Contributions 6
1.3 Outline 7
Chapter 2 Background 9
2.1 IO flow on Scale-out Storage System 9
2.2 Shared-nothing Architecture 10
2.3 Block Storage Service 11
2.4 Replication Strategy 11
2.5 Target Distributed Scale-out Storage System 12
2.6 Deduplication Range 14
Chapter 3 CPU-efficient I/O processing for Distributed Storage System 16
3.1 Motivation 16
3.1.1 Performance Analysis on Eventual Consistency System 16
3.1.1.1 Observations 18
3.1.2 Performance Analysis on Strong Consistency System 21
3.1.2.1 Long Write Path to Commit a Write 22
3.1.2.2 Inefficient Threading Architecture 23
3.1.2.3 High Overhead of Backend Data Store 25
3.2 Related Work 26
3.2.1 Discussion 28
3.3 Design and Implementation 29
3.3.1 Design Principles 29
3.3.2 Implementation on Eventual Consistency Service 30
3.3.2.1 Locality aware thread control 30
3.3.2.2 LCA (Lock Contention Avoidance) 32
3.3.2.3 Putting it all together 35
3.3.2.4 Implementation Notes on GlusterFS 37
3.3.3 Implementation on Strong Consistency Service 38
3.3.3.1 Pipelined Replication Processing 38
3.3.3.2 Prioritized Thread Control 39
3.3.3.3 Application-managed Data Store 41
3.3.3.4 Implementation Details 42
3.4 Evaluation 47
3.4.1 Environmental setup for Eventual Consistency Service 47
3.4.2 Random I/O performance & Context switching 48
3.4.3 CPU Usage & Cache Miss 49
3.4.4 Performance impact of Proposed solutions 50
3.4.5 Performance Scalability 51
3.4.6 Sequential I/O performance 51
3.4.7 Performance Comparison between RTC and LALCA 53
3.4.8 Macro workload 54
3.4.8.1 GlusterFS metadata operation 54
3.4.8.2 Filebench result 55
3.4.8.3 SPEC SFS 2014 result 55
3.4.9 Environmental setup for Strong Consistency Service 57
3.4.10 Small Random Performance 57
3.4.11 Sequential I/O Performance 59
3.4.12 Performance Scalability 60
3.5 Summary 62
Chapter 4 Design of Global Deduplication based on Tiering 63
4.1 Motivation 63
4.1.1 Problem Definition 63
4.2 Related Work 66
4.3 Design and Implementation 68
4.3.1 Key Idea 68
4.4 Design 72
4.4.1 Object 72
4.4.2 Pool-based Object Management 73
4.4.3 Cache Manager 73
4.4.4 Data Deduplication 74
4.4.4.1 Deduplication engine 74
4.4.4.2 Deduplication rate control 75
4.4.5 I/O Path 76
4.4.6 Consistency Model 77
4.5 Implementation Notes on Ceph 78
4.6 Evaluation 79
4.6.1 Environment setup 79
4.6.2 Performance Comparison 80
4.6.2.1 Small Random Performance 80
4.6.2.2 Sequential Performance 82
4.6.3 Space Saving 83
4.6.4 Deduplication Synergy Effect with Storage Features 84
4.6.4.1 High Availability with Deduplication 84
4.6.4.2 Data Recovery Acceleration with Deduplication 85
4.6.4.3 Combination with Data Compression For Maximized Capacity Saving 86
4.6.5 Deduplication Rate Control 87
4.7 Summary 87
Chapter 5 Conclusion 89
요약 103
-
dc.language.isoeng-
dc.publisher서울대학교 대학원-
dc.subjectDistributed Storage System-
dc.subjectTiering-
dc.subjectDeduplication-
dc.subjectPerformance-
dc.subject분산 스토리지 시스템-
dc.subject티어링-
dc.subject디듀플리케이션-
dc.subject성능-
dc.subject.ddc621.39-
dc.titleDesign of Efficient and Fast Distributed Storage System for Low Latency Storage Devices-
dc.title.alternative고성능 스토리지를 고려한 효율적이고 최적화된 분산 스토리지 시스템 디자인-
dc.typeThesis-
dc.typeDissertation-
dc.contributor.department공과대학 컴퓨터공학부-
dc.description.degreeDoctor-
dc.date.awarded2020-08-
dc.contributor.major분산시스템-
dc.identifier.uciI804:11032-000000162093-
dc.identifier.holdings000000000043▲000000000048▲000000162093▲-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share