Publications

Detailed Information

PLF-Join: An Efficient MapReduce Algorithm for Vector Similarity Join : PLF-Join: 벡터 유사 조인을 위한 효율적인 맵리듀스 알고리즘

DC Field Value Language
dc.contributor.advisor이상구-
dc.contributor.author김현준-
dc.date.accessioned2017-07-14T03:00:04Z-
dc.date.available2017-07-14T03:00:04Z-
dc.date.issued2015-02-
dc.identifier.other000000025849-
dc.identifier.urihttps://hdl.handle.net/10371/123152-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 이상구.-
dc.description.abstractVector similarity join is a problem of finding all pairs of vectors which has a similarity measure that exceeds a given threshold from a set of vectors. Vector similarity join is used in many applications such as near duplication detection in web pages, recommendation, and mining social data. However, it requires O(n^2) complexity where n is the number of vectors. This impractical time complexity makes it hard to utilize Vector similarity join on many real world problems.
Hence, a lot of the Hadoop MapReduce algorithms were proposed to quickly compute Vector similarity join. The state-of-the-art algorithm considers prefix filtering and length filtering methods to reduce the time taken for Vector similarity join operation. To even further reduce this time complexity, we propose a variation of an algorithm that
can be used to reduce the overhead involved in the network I/O cost. Along with a MapReduce algorithm we propose an efficient pre-processing technique which facilitates Vector similarity join calculation.
-
dc.description.tableofcontentsAbstract i
Contents iii
List of Figures v
Chapter 1 Introduction 1
Chapter 2 Preliminary 4
2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Filtering Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Prefix Filtering . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Length Filtering . . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3 Related Works 10
3.1 V-SMART Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 VCL Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Bjoin Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 4 Pre-Processing 16
4.1 Pre-Processing Method of Previous Researches . . . . . . . . . . 16
4.2 StdSort : Sorting with Standard Deviation . . . . . . . . . . . . . 17
Chapter 5 PLF-Join 22
5.1 Job 1 : Filter Dissimilar Pairs . . . . . . . . . . . . . . . . . . . . 22
5.2 Job 2 : Re-import the Vectors . . . . . . . . . . . . . . . . . . . . 25
5.3 Job 3 : Calculate Similarity . . . . . . . . . . . . . . . . . . . . . 28
5.4 Example of PLF-join . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 6 Experiment 31
6.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Time Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3 StdSort Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.4 Combining PLF-join and StdSort . . . . . . . . . . . . . . . . . . 34
Chapter 7 Conclusion 39
Bibliography 41
요약 43
감사의글 44
-
dc.formatapplication/pdf-
dc.format.extent2911722 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject벡터 유사 조인-
dc.subject.ddc621-
dc.titlePLF-Join: An Efficient MapReduce Algorithm for Vector Similarity Join-
dc.title.alternativePLF-Join: 벡터 유사 조인을 위한 효율적인 맵리듀스 알고리즘-
dc.typeThesis-
dc.description.degreeMaster-
dc.citation.pages45-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2015-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share