Publications

Detailed Information

Instance-based Hierarchical Schema Alignment in Linked Data

DC Field Value Language
dc.contributor.advisor김홍기-
dc.contributor.author종남소-
dc.date.accessioned2017-07-14T05:43:11Z-
dc.date.available2017-07-14T05:43:11Z-
dc.date.issued2015-08-
dc.identifier.other000000056795-
dc.identifier.urihttps://hdl.handle.net/10371/125082-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 치의과학과 의료경영과정보학전공, 2015. 8. 김홍기.-
dc.description.abstractAlong with the development of Web of documents, there is a natural need for sharing, exchanging, and merging heterogeneous data to provide more comprehensive information and answer users with more complex questions. However, the data published on the Web are raw dumps that sacrifice much of the semantics that can be used for exchanging and integrating data. Resource Description Framework (RDF) and Linked Data are designed to expose the semantics of data by interlinking data represented with well-defined relations. With the profusion of RDF resources and Linked Data, ontology alignment has gained significance in providing highly comprehensive knowledge embedded in disparate sources. Ontology alignment, however, in Linking Open Data (LOD) has traditionally focused more on the instance-level rather than the schema-level. Linked Data supports schema-level matching, provided that instance-level matching is already established. Linked Data is a hotbed for instance-based schema matching, which is considered a better solution for matching classes with ambiguous or obscure names. In this dissertation, the author focuses on three issues in instance-based schema alignment for Linked Data: (1) how to align schemas based on instances, (2) how to scale the schema alignment, (3) how to generate a hierarchical schema structure.
Targeting the first issue, the author has proposed an instance-based schema alignment algorithm called IUT. The IUT builds a unified taxonomy for the classes from two ontologies based on an instance-class matrix and obtains the relations of two classes by the common instances. The author tested the IUT with DBpedia and YAGO2, and compared the IUT with two state-of-the-art methods in four alignment tasks. The experiments show that the IUT outperforms the methods in terms of efficiency and effectiveness (e.g., costs 968 ms to obtain 0.810 F-score on intra-subsumption alignment in DBpedia).
Targeting the second issue, the author has proposed a scaled version of the IUT called IUT(M). The IUT(M) decreases the computations of the IUT from two aspects based on Locality Sensitive Hashing (LSH): (1) decreasing the similarity computations for each pair of classes with MinHash functions, and (2) decreasing the number of similarity computations with banding. The author tested the IUT(M) with YAGO2-YAGO2 intra-subsumption alignment task to demonstrate that the running time of IUT can be reduced by 94% with a 5% loss in F-score.
Targeting the third issue, the author has proposed a method to generate a faceted taxonomy based on object properties on Linked Data. A framework is proposed to build a sub-taxonomy in each facet with sub-data, extracted with an object property, with an Instance-based Concept Taxonomy generation algorithm called ICT. Two experiments demonstrate: (1) The ICT efficiently and effectively generates a sub-taxonomy with rdf:type in DBpedia and YAGO2 (e.g., costs 49 and 11,790 ms to build the concept taxonomies that achieve 0.917 and 0.780 on Taxonomic F-score). (2) The faceted taxonomies for Diseasome and DrugBank, efficiently generated based on multiple object properties (e.g., costs 2,032 and 2,525 ms to build the faceted taxonomies based on 6 and 16 properties), can effectively reduce the search spaces in faceted searches (e.g., obtains 1.65 and 1.03 on Maximum Resolution with 2 facets).
-
dc.description.tableofcontents1 Introduction 1
1.1 Background and Motivations 1
1.1.1 Data Integration and Schema Alignment 1
1.1.2 From RDF to Linked Data 3
1.1.3 Schema Alignment in Linked Data 5
1.2 Instance-based Schema Alignment 9
1.3 Contributions of this Dissertation 13
1.4 Organization of this Dissertation 15
2 Preliminaries and Related Works 17
2.1 Preliminaries 17
2.1.1 RDF and Linked Data 17
2.1.2 Ontology and Schema Alignment in Linked Data 20
2.2 Related Works 23
2.2.1 Instance-based Schema Alignment 23
2.2.2 Scaling Pairwise Similarity Computations 29
2.2.3 Automatic Taxonomy Generation 32
3 Aligning Schemas with Subsumption and Equivalence Relations 36
3.1 Introduction 36
3.2 Problem Definition 38
3.3 Methods 41
3.3.1 Workflow of Instance-based Schema Alignment 41
3.3.2 Instance-class Matrix Generation 42
3.3.3 Subsumption and Equivalence Relations Discovering 44
3.4 Experiments 48
3.4.1 Schema Alignment Algorithms in Comparison 48
3.4.2 Data and Experiment Design 48
3.5 Results 52
3.5.1 Intra-subsumption Relations for YAGO2-YAGO2 54
3.5.2 Intra-subsumption Relations for DBpedia-DBpedia 58
3.5.3 Inter-Subsumption and Equivalence Relations for YAGO2-DBpedia 61
3.5.4 Effects of χ_s and χ_e for the IUT 67
3.6 Discussions 71
3.7 Conclusion 75
4 Scaling Pair-wise Computations Using the Locality Sensitive Hashing 76
4.1 Introduction 76
4.2 Methods 78
4.2.1 MinHash and Signatures 79
4.2.2 Banding Technique 83
4.2.3 Scaling the IUT with MinHash and Banding 85
4.3 Experiment 87
4.4 Discussions 92
4.5 Conclusion 93
5 Unsupervised Hierarchical Schema Structure Generation in Linked Data 94
5.1 Introduction 94
5.2 Faceted Taxonomy for Linked Data 98
5.3 Framework 101
5.3.1 Facets Extraction 102
5.3.2 Instance Restriction and Redundancy Removal 102
5.3.3 Redundant Object Removal 103
5.3.4 Instance-object Matrix Generation 103
5.4 Generating Faceted Taxonomy 105
5.4.1 The Problem of Generating a Sub-taxonomy for a Facet 105
5.4.2 Concept Definition and Naming 105
5.4.3 Taxonomy Generation Algorithm 108
5.4.4 Instantiation and Taxonomy Refinement 110
5.5 Experiments 112
5.5.1 Task 1-Construction of Taxonomy with rdf:type 112
5.5.2 Task 2-Construction of Multiple Faceted Taxonomies 115
5.6 Results 119
5.6.1 Results of Task 1 119
5.6.2 Results of Task 2 124
5.7 Discussion 131
5.8 Conclusion 133
6 Future Works and Conclusion 134
6.1 Future Works 134
6.1.1 Similarity Measures for Instance-based Schema Alignment 134
6.1.2 Ontology Evolution for Instance-based Schema Alignment 135
6.1.3 Combining the IUT with Structure- and Lexical-based Methods 136
6.1.4 Scaling the IUT with Parallel Computations 137
6.1.5 Faceted Navigation and Search for Linked Data 137
6.2 Conclusion 139
Bibliography 142
초록 152
-
dc.formatapplication/pdf-
dc.format.extent3141673 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectSchema Alignment-
dc.subjectInstance-based Matching-
dc.subjectLinked Data-
dc.subjectScaling Alignment-
dc.subjectHierarchy Generation-
dc.subject.ddc617-
dc.titleInstance-based Hierarchical Schema Alignment in Linked Data-
dc.typeThesis-
dc.description.degreeDoctor-
dc.citation.pages154-
dc.contributor.affiliation치의학대학원 치의과학과-
dc.date.awarded2015-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share