An Indexing Framework for Improving Data Consistency of Triple Database

강승석

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

An Indexing Framework for Improving Data Consistency of Triple Database : 트리플 데이터베이스의 데이터 일관성 향상을 위한 인덱싱 프레임워크

DC Field	Value	Language
dc.contributor.advisor	이상구	-
dc.contributor.author	강승석	-
dc.date.accessioned	2017-07-13T08:58:40Z	-
dc.date.available	2017-07-13T08:58:40Z	-
dc.date.issued	2013-08	-
dc.identifier.other	000000013806	-
dc.identifier.uri	https://hdl.handle.net/10371/119992	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 컴퓨터공학과, 2013. 8. 이상구.	-
dc.description.abstract	시맨틱 웹에서의 데이터 양이 증가함에 따라, 대용량의 데이터를 유연한 형식으로 저장하고 시스템 간의 정보 공유를 하는 것은 필수적인 항목이 되었다. 트리플(Triple)은 시맨틱 웹에서 사용되는 유연한 데이터 표현 방식의 표준으로, 만약 기업이 보유하고 있는 관계 데이터베이스 기반의 컨텐츠 데이터를 트리플 형식으로 표현할 경우, 그 유연성과 활용성 때문에 다양한 목적으로 시맨틱 웹에서 데이터를 활용할 수 있다. 본 논문에서는 트리플을 기반으로 한 트리플 데이터베이스의 신뢰도를 확보하기 위하여, 트리플 데이터베이스를 기업에서 사용하기 위한 필수적인 요소들을 구현하였다. 첫째, 트리플 데이터베이스를 위한 무결성 제약 조건을 제안한다. 무결성 제약 조건은 관계 데이터베이스로부터 추출된 것으로, 정확하게 동일한 의미를 가지고 트리플 데이터베이스에 적용되도록 해석되었다. 또한 정보의 손실 없이 데이터를 트리플로 바꾸어 저장하는 것뿐만 아니라, 저장된 트리플을 빠른 질의 처리 속도와 더불어 유용하게 사용하는 것도 실용성의 측면에서 중요하다. 그러나 현재까지의 트리플 기반 인덱스 연구들은 트리플이 중복되어 색인되거나, 하나의 트리 안에 너무 많은 색인 키를 저장하는 문제를 가지고 있다. 이러한 문제를 해결하기 위하여 둘째로 트리플 데이터베이스를 위한 새로운 인덱스 구조를 제안한다. 새로운 인덱스 구조는 트리플의 중복을 최소화하는 구조로 설계되었으며, 트리플 구성 요소에 기반하여 인덱스 트리를 분리함으로써 보다 빠르고 가벼운 색인 키 검색을 가능하게 한다. 셋째, 트리플 데이터베이스를 위한 새로운 단축 경로 선택(Shortcut Selection) 기법을 제안한다. 단축 경로 선택 기법은 트리플 데이터베이스에서 질의를 수행할 때 가장 많이 발생하는 성능 저하 요인인 자기 조인(Self-Join)을 해결하기 위한 방법이다. 일반적으로 한 번의 질의를 위해 트리플 테이블 전체가 조인에 참가할 경우 막대한 질의 비용이 발생하게 된다. 제안하는 새로운 단축 경로 선택 기법은 조인이 발생하는 질의에 대해 미리 시작점으로부터 끝점까지 이어지는 단축 경로에 해당하는 트리플을 우선적으로 추가하여 조인을 사전에 차단하는 기법으로써, 기존 연구에서 고려하고 있지 않은 트리플 그래프 특성에 기반한 단축 경로 우선 차단과 데이터베이스 갱신을 고려한 갱신 빈도(Update Frequency) 기반의 이득 계산(Benefit Calculation) 모델을 새롭게 설계하였다. 다양한 분야의 데이터를 이용한 질의 시간 측정 등의 실험을 통하여, 본 연구에서 제시한 기법들이 트리플 데이터베이스를 효율적으로 사용하는 데 최신 연구 대비 향상된 성능을 보인다는 것을 검증하였다.	-
dc.description.abstract	As more data are provided in Semantic Web, processing large amounts of data with flexible format, and interlinking the applications with utilization have become important. In relational databases, a user must acquaint with the schema information to execute certain query on database. Triple is a well-knows flexible data representation format in Semantic Web. If we represent the content in relational database in triple data format, system can utilize the enterprise data with flexibility for various purposes. To guarantee the reliability of triple database, the enforcement of integrity constraints on triple database is required. Integrity constraints are retrieved from the relational database, and translated into triple database with exact same meaning. Triple database can get reliability and consistency by adapting the concept of the enforcement of integrity constraints. Not only representing content by triple data format without the loss of information, but also organizing triples efficiently is important to use triple database practically. However, most existing triple index techniques suffer from data duplication and the problem of large index sizes. In the thesis, we analyze the drawback of existing triple indexing methods from the viewpoint of the reliability and effectiveness of a triple database. We also consider the issues that need to be addressed to build a triple index for the management of relational database-based triple data. As a result, we propose Tridex: a lightweight B+-tree triple index, designed to facilitate efficient processing of triple database. Tridex is beneficial in reduced size of index tree and less data redundancy. In addition, we propose the enhanced shortcut selection methods in triple database. Triples are commonly represented as a directed graph. With a given triple graph, retrieving data by particular paths can be very expensive due to the self-join problem in triple database. To reduce the self-join operations during query execution, we extend the concept of shortcut, a direct path between specific nodes. By adding appropriate shortcuts in triple database, self-join operations in triple database can be reduced. We propose a reduced candidate shortcut selection considering the maintenance of triple database. The experimental evaluations compare our approach with the state-of-the-art approaches and show adequate performance with less building time in terms of effectiveness and efficiency.	-
dc.description.tableofcontents	Chapter 1. Introduction 1 1.1 Research Motivation 2 1.2 Our Contributions 8 1.3 Outline 18 Chapter 2. Related Work 19 2.1 Triple Mapping 19 2.2 Triple Storing 23 2.3 Index Structures for Triple Database 28 Chapter 3. Background 34 3.1 Terminologies and Design Principle 35 3.2 Triple Data Model 36 3.2.1 Design of Triple Database 37 3.2.2 Triple Transformation 38 3.2.3 Triple Table and Meta Table 41 Chapter 4. Integrity Constraints on Triple Database 44 4.1 Definition of Integrity Constraints on Triple Database 45 4.2 Enforcement of Integrity Constraints 46 4.2.1 Primary Key Constraint 49 4.2.2 Functional Dependency 49 4.2.3 Referential Integrity 50 4.2.4 Not Null Constraint 52 4.2.5 Unique Constraint 52 4.2.6 User-Defined Domain Constraint 53 4.3 Database Operations for Integrity Constraints 54 Chapter 5. Triple Index Structure for Integrity Constraints 61 5.1 Motivation and Problem Definition 63 5.2 Tridex: A Lightweight Triple Index Structure 67 5.2.1 Query Set and Query Pattern 67 5.2.2 Description of Tridex 69 5.2.3 Analysis of Tridex 75 5.3 Experiments 78 5.3.1 Experimental Setup 79 5.3.2 Scalability 81 5.3.3 Performance 83 Chapter 6. Shortcut Selection 93 6.1 Preliminaries 97 6.1.1 Schema Graph and Instance Graph 98 6.1.2 Shortcut and Query Workload 99 6.2 Problem Specification 103 6.3 Reducing Candidate Shortcuts 109 6.3.1 Using Schema Information 109 6.3.2 Using PageRank 112 6.4 Shortcut Benefit Calculation 116 6.4.1 Modeling of Shortcut Benefit Function 116 6.4.2 Modeling of Shortcut Profit Function 117 6.4.3 Modeling of Shortcut Cost Function 119 6.5 Resource Constraints 122 6.5.1 Unbounded 123 6.5.2 Space Constrained 123 6.5.3 Time Constrained 124 6.6 Experiments 126 6.6.1 Experimental Setup and Query Set 128 6.6.2 Baseline Algorithms 134 6.6.3 Performance of Shortcut Building Time 135 6.6.4 Performance of Query Response Time 135 6.6.5 Space Limitation and Shortcut Maintenance Cost 138 Chapter 7. Conclusion 146 Bibliography 147	-
dc.format	application/pdf	-
dc.format.extent	4803091 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	Triple Database	-
dc.subject	Semantic Web	-
dc.subject	Integrity Constraint	-
dc.subject	Index Structure	-
dc.subject	Shortcut Selection	-
dc.subject.ddc	621	-
dc.title	An Indexing Framework for Improving Data Consistency of Triple Database	-
dc.title.alternative	트리플 데이터베이스의 데이터 일관성 향상을 위한 인덱싱 프레임워크	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	Seungseok Kang	-
dc.description.degree	Doctor	-
dc.citation.pages	ix, 158	-
dc.contributor.affiliation	공과대학 컴퓨터공학과	-
dc.date.awarded	2013-08	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Files in This Item:

000000013806.pdf 4.58 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share