Hadoop MapReduce Performance Enhancement Using In-Node Combiners

이우현

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Hadoop MapReduce Performance Enhancement Using In-Node Combiners : 노드기반 컴바이너를 이용한 하둡 맵리듀스 성능 개선

DC Field	Value	Language
dc.contributor.advisor	김형주	-
dc.contributor.author	이우현	-
dc.date.accessioned	2017-07-14T03:00:47Z	-
dc.date.available	2017-07-14T03:00:47Z	-
dc.date.issued	2015-02	-
dc.identifier.other	000000026798	-
dc.identifier.uri	https://hdl.handle.net/10371/123168	-
dc.description	학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 김형주.	-
dc.description.abstract	다양한 종류의 애플리케이션과 기기들이 기하급수적인 양의 데이터를 실시간으로 생성한다. 대용량 데이터 분석에 대한 수요가 증가함에 따라 효과적인 방법으로 많은 양의 데이터를 저장 및 처리하는 기술이 요구되고 있다. 데이터 분석은 주어진 하드웨어를 사용하여 허용된 범위 시간 안에 처리되어야 한다. 이를 위해 하둡은 효율적인 대용량 데이터 분산 저장과 분산 병렬 컴퓨팅을 지원한다. 맵리듀스는 하둡이 지원하는 강력한 분산 프로그래밍 모델로서 다양한 형식의 데이터를 처리한다. 본 논문은 맵리듀스 작업의 병목으로 지목되는 I/O 비용에 대한 개선 방안을 제시한다. 많은 연구가 맵리듀스 작업의 입력 데이터를 메모리에 캐시하여 데이터 매핑 단계의 디스크 I/O를 최소화하는 방법에 대한 효율성을 증명하였다. 본 논문은 셔플 단계의 I/O를 줄이는 방안으로 동일한 노드에서 실행되는 모든 매퍼(Mapper)의 결과 값을 인메모리 캐시에 저장하여 노드별 결과 값 크기를 최소화하는 노드기반 컴바이너(In-Node Combiner)를 제안한다. 실험 결과 기존 연구에 비해 노드기반 컴바이너를 사용하였을 경우 맵리듀스 작업의 성능이 20% 이상 향상하는 것을 확인하였다.	-
dc.description.abstract	Overwhelming amount of data is being generated by various applications and devices in real-time. While advanced analysis of large dataset is in high demand, data sizes have surpassed capabilities of conventional software and hardware. Data-intensive analytics should be processed in tolerable elapsed time using commodity hardware. Hadoop framework efficiently distributes large datasets over multiple commodity servers and the MapReduce framework performs parallel computations. We discuss the I/O bottlenecks of Hadoop MapReduce framework and propose methods for enhancing I/O performance in common MapReduce jobs. A proven approach is to cache input data to maximize memory-locality of all map tasks. We introduce an approach to optimize I/O in the shuffle phase, the in-node combining design which extend the scope of the traditional combiner to a node level. The in-node combiner reduces the total number of emitted intermediate results and curtail network traffic between mappers and reducers.	-
dc.description.tableofcontents	Abstract i Introduction 3 Related Work 6 2.1 Hadoop Distributed File System 6 2.2 Hadoop MapReduce 7 2.3 Hadoop I/O Optimization 10 2.4 NoSQL 12 Background 15 3.1 HDFS Bottleneck 15 3.1.1 In-Memory Cache 16 3.2 Shuffle Bottleneck 17 3.2.1 Traditional Combiner 18 3.2.2 In-Mapper Combiner 19 Our Approach 22 4.1 In-Node Combiner (INC) 23 4.2 Implementation 25 4.2.1 System Architecture 27 Experiment 29 5.1 In-Memory Cache 30 5.2 Combiner 31 Conclusion 36 Reference 38 Appendix 41 1. Hadoop 2.0 41	-
dc.format	application/pdf	-
dc.format.extent	755541 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	MapReduce	-
dc.subject	Hadoop	-
dc.subject	HDFS	-
dc.subject	Combiner	-
dc.subject	NoSQL	-
dc.subject.ddc	621	-
dc.title	Hadoop MapReduce Performance Enhancement Using In-Node Combiners	-
dc.title.alternative	노드기반 컴바이너를 이용한 하둡 맵리듀스 성능 개선	-
dc.type	Thesis	-
dc.description.degree	Master	-
dc.citation.pages	iv, 46	-
dc.contributor.affiliation	공과대학 전기·컴퓨터공학부	-
dc.date.awarded	2015-02	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Files in This Item:

000000026798.pdf 0.72 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share