Publications

Detailed Information

A Machine Learning-based Methodology to Detect I/O Performance Bottlenecks for Hadoop Systems

DC Field Value Language
dc.contributor.advisor염헌영-
dc.contributor.author성민영-
dc.date.accessioned2017-07-14T02:54:11Z-
dc.date.available2017-07-14T02:54:11Z-
dc.date.issued2014-02-
dc.identifier.other000000016930-
dc.identifier.urihttps://hdl.handle.net/10371/123033-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 염헌영.-
dc.description.abstractAs distributed systems such as clusters and clouds for processing big data grow in scale these days, detecting I/O performance bottlenecks is one of the biggest challenges in achieving high performance. A set of extremely slow straggler tasks may be a direct cause for the bottlenecks, which can degrade the overall performance of Hadoop systems. Furthermore, due to different kinds of bottleneck, the efficiency in related resource usage and energy consumption may decrease. In most cases, users have little idea about the performance of which task is degraded and why. To address this problem, we have developed an I/O performance bottleneck detection methodology for Hadoop systems. There are two key aspects in our methodology. First, I/O profiling is performed per Hadoop task in order to extract feature values that may be related to performance degradation. Then all feature value sets for all Hadoop tasks are analyzed by using the Machine Learning technique. As a result, the most relevant multiple features among all features can be selected and low performance Hadoop tasks are identified as performance bottlenecks. Second, it is possible to provide performance improvement guidelines such as use of resource scheduling alternatives based on the result of using our methodology. We have found out that use of our methodology may lead to up to about 37% performance enhancements in a scalable environment based on the identification of the performance bottlenecks.-
dc.description.tableofcontentsAbstract i
Contents iii
List of Figures v
List of Tables vi
Chapter 1 Introduction 1
Chapter 2 Background 6
2.1 Analysis of I/O for detection of performance bottlenecks . . . . . 7
2.1.1 Map phase . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Shuffle phase . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Reduce phase . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Performance comparison . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 3 Bottleneck Detection Methodology 11
3.1 Overall design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Node level feature selection . . . . . . . . . . . . . . . . . . . . . 13
3.3 Hadoop level feature selection . . . . . . . . . . . . . . . . . . . . 13
3.4 Block level feature selection . . . . . . . . . . . . . . . . . . . . . 15
3.5 Machine Learning technique . . . . . . . . . . . . . . . . . . . . . 18
3.5.1 Algorithm for ML technique . . . . . . . . . . . . . . . . . 19
3.5.2 Feature selection . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 4 Evaluation 22
4.1 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Feature selection analysis . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Experiment with a variety of workload . . . . . . . . . . . . . . . 28
4.4 Scale-out evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 5 Improvement Guidelines 32
5.1 I/O scheduler alternatives . . . . . . . . . . . . . . . . . . . . . . 32
5.2 SATA NCQ alternatives . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 6 Related Work 35
Chapter 7 Conclusion 38
Bibliography 39
요약 42
Acknowledgements 43
-
dc.formatapplication/pdf-
dc.format.extent2678730 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectMapReduce-
dc.subjectHadoop-
dc.subjectI/O Performance Bottleneck Detection-
dc.subjectMonitoring-
dc.subjectMachine Learning-
dc.subject.ddc621-
dc.titleA Machine Learning-based Methodology to Detect I/O Performance Bottlenecks for Hadoop Systems-
dc.typeThesis-
dc.description.degreeMaster-
dc.citation.pagesvi, 44-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2014-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share