Publications
Detailed Information
PLF-Join: An Efficient MapReduce Algorithm for Vector Similarity Join : PLF-Join: 벡터 유사 조인을 위한 효율적인 맵리듀스 알고리즘
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 이상구 | - |
dc.contributor.author | 김현준 | - |
dc.date.accessioned | 2017-07-14T03:00:04Z | - |
dc.date.available | 2017-07-14T03:00:04Z | - |
dc.date.issued | 2015-02 | - |
dc.identifier.other | 000000025849 | - |
dc.identifier.uri | https://hdl.handle.net/10371/123152 | - |
dc.description | 학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 이상구. | - |
dc.description.abstract | Vector similarity join is a problem of finding all pairs of vectors which has a similarity measure that exceeds a given threshold from a set of vectors. Vector similarity join is used in many applications such as near duplication detection in web pages, recommendation, and mining social data. However, it requires O(n^2) complexity where n is the number of vectors. This impractical time complexity makes it hard to utilize Vector similarity join on many real world problems.
Hence, a lot of the Hadoop MapReduce algorithms were proposed to quickly compute Vector similarity join. The state-of-the-art algorithm considers prefix filtering and length filtering methods to reduce the time taken for Vector similarity join operation. To even further reduce this time complexity, we propose a variation of an algorithm that can be used to reduce the overhead involved in the network I/O cost. Along with a MapReduce algorithm we propose an efficient pre-processing technique which facilitates Vector similarity join calculation. | - |
dc.description.tableofcontents | Abstract i
Contents iii List of Figures v Chapter 1 Introduction 1 Chapter 2 Preliminary 4 2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Filtering Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Prefix Filtering . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Length Filtering . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 Related Works 10 3.1 V-SMART Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 VCL Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Bjoin Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 4 Pre-Processing 16 4.1 Pre-Processing Method of Previous Researches . . . . . . . . . . 16 4.2 StdSort : Sorting with Standard Deviation . . . . . . . . . . . . . 17 Chapter 5 PLF-Join 22 5.1 Job 1 : Filter Dissimilar Pairs . . . . . . . . . . . . . . . . . . . . 22 5.2 Job 2 : Re-import the Vectors . . . . . . . . . . . . . . . . . . . . 25 5.3 Job 3 : Calculate Similarity . . . . . . . . . . . . . . . . . . . . . 28 5.4 Example of PLF-join . . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 6 Experiment 31 6.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.2 Time Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.3 StdSort Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.4 Combining PLF-join and StdSort . . . . . . . . . . . . . . . . . . 34 Chapter 7 Conclusion 39 Bibliography 41 요약 43 감사의글 44 | - |
dc.format | application/pdf | - |
dc.format.extent | 2911722 bytes | - |
dc.format.medium | application/pdf | - |
dc.language.iso | en | - |
dc.publisher | 서울대학교 대학원 | - |
dc.subject | 벡터 유사 조인 | - |
dc.subject.ddc | 621 | - |
dc.title | PLF-Join: An Efficient MapReduce Algorithm for Vector Similarity Join | - |
dc.title.alternative | PLF-Join: 벡터 유사 조인을 위한 효율적인 맵리듀스 알고리즘 | - |
dc.type | Thesis | - |
dc.description.degree | Master | - |
dc.citation.pages | 45 | - |
dc.contributor.affiliation | 공과대학 전기·컴퓨터공학부 | - |
dc.date.awarded | 2015-02 | - |
- Appears in Collections:
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.