Sparse hypergraph clustering and its applications

김선아

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Sparse hypergraph clustering and its applications : 희박한 하이퍼그래프의 클러스터링과 그 활용

DC Field	Value	Language
dc.contributor.advisor	유연주	-
dc.contributor.author	김선아	-
dc.date.accessioned	2017-10-27T16:56:49Z	-
dc.date.available	2017-10-27T16:56:49Z	-
dc.date.issued	2017-08	-
dc.identifier.other	000000146292	-
dc.identifier.uri	https://hdl.handle.net/10371/136965	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 사범대학 수학교육과, 2017. 8. 유연주.	-
dc.description.abstract	Clustering is one of the most popular methods to extract meaningful patterns from data. In genomics, increasingly large amounts of DNA sequencing data are being generated. Developing effective clustering tools appropriate for each data structure is a major challenge. In this thesis, we develop the CLQ and CLQ-D clustering algorithm to partition SNP sequence data using theoretical graph-based approaches. Based on these clustering algorithms, the LD block construction method, Big-LD, is able to detect the LD blocks of SNP sequence data using interval graph modeling of the clustering results. A sparse graph is defined as a graph in which the actual number of edges is much less than the possible number of edges. Real-world data including biological, social, and internet network data can be modeled as sparse graphs. Due to the structural characteristic of SNP data, graph models constructed for the clustering algorithm and LD block construction algorithm have a sparse structure, which facilitates the efficient operation of the algorithm in terms of time and memory usage. The Big-LD algorithm detects LD blocks including "holes", which are not allowed in the previous methods. Based on the LD block structure constructed by the Big-LD algorithm, we investigated the relationships between big LD block structure and biological phenomena using the HapMap phase 3 data and phase 1 data of the 1000 Genomes Project. The LD block boundaries detected by the Big-LD algorithm coincided better with the recombination hotspots than previous methods. In addition, we demonstrate that the comparison of LD block structures can provide additional information about positive selection using the results applied to the candidate regions suggested by previous research. By generalizing the Big-LD algorithm, which is designed to partition SNP sequence data into blocks, we suggest four clustering algorithms—PSHSC, PSHRC, PSHSQ, and PSHRQ—for the sparse hypergraph partitioning problem. Simulation experiments demonstrated that the algorithms generate high-quality partitions in terms of global and local quality measures. The partitioning results closely agreed with the true underlying cluster structures of simulated hypergraphs. We also applied the developed algorithm to the problem of predicting protein complexes in yeast protein-protein interaction network data, and confirm its potential as a tool for clustering biological network datasets.	-
dc.description.tableofcontents	Contents I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives and contributions of the thesis . . . . . . . . . . . . . . 7 1.3 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . 9 II. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Basic graph theory . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Terminologies in genetics . . . . . . . . . . . . . . . . . . . . . . . 13 III. Clusteringofgenomicdata . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 CLQ algorithm . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 CLQ-D: construction of LD bins . . . . . . . . . . . . . . . 21 3.2.3 Big-LD: construction of LD blocks . . . . . . . . . . . . . 24 3.3 Evaluation of Big-LD algorithm . . . . . . . . . . . . . . . . . . . 32 3.3.1 Evaluation data . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Implementation and performance evaluation . . . . . . . . . 33 3.3.3 Comparisons of Big-LD block partition results with preexisting methods . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.4 LD block and recombination hotspots . . . . . . . . . . . . 56 3.3.5 Multi-SNP association experiments using the results of different block partition methods . . . . . . . . . . . . . . . . 59 iii 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 IV. Comparisons of linkage disequilibrium blocks of different populationsforpositiveselection . . . . . . . . . . . . . . . . . . . . . . . . 72 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.1 Previous methods: XP-EHH and CMS tests . . . . . . . . . 74 4.2.2 Comparison measure of LD block partitions . . . . . . . . . 75 4.2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 V. Sparsehypergraphpartitioning . . . . . . . . . . . . . . . . . . . . 91 5.1 Motivation and background . . . . . . . . . . . . . . . . . . . . . . 91 5.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.1 Multi-level hypergraph partitioning . . . . . . . . . . . . . 94 5.2.2 Spectral clustering . . . . . . . . . . . . . . . . . . . . . . 96 5.2.3 Dense subgraph partitioning . . . . . . . . . . . . . . . . . 98 5.2.4 k-clique clustering . . . . . . . . . . . . . . . . . . . . . . 98 5.3 Hypergraph partitioning algorithms . . . . . . . . . . . . . . . . . 100 5.3.1 Algorithm overview . . . . . . . . . . . . . . . . . . . . . 100 5.3.2 Construction of the line graph of a hypergraph . . . . . . . 101 5.3.3 Listing the dense sets of the line graph . . . . . . . . . . . . 103 5.3.4 Finding the MWIS of the intersection graph . . . . . . . . 109 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4.2 Quality measures . . . . . . . . . . . . . . . . . . . . . . . 116 iv 5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5 Detecting protein complexes in PPI networks . . . . . . . . . . . . 133 5.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 I. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A.1 Coding correction algorithm . . . . . . . . . . . . . . . . . . . . . 146 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Abstract(inKorean) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166	-
dc.format	application/pdf	-
dc.format.extent	17333188 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	hypergraph	-
dc.subject	clustering	-
dc.subject	maximum weight independent set	-
dc.subject	clique	-
dc.subject	linkage disequilibrium	-
dc.subject	LD block	-
dc.subject	single nucleotide polymorphism	-
dc.subject	positive selection	-
dc.subject.ddc	510.7	-
dc.title	Sparse hypergraph clustering and its applications	-
dc.title.alternative	희박한 하이퍼그래프의 클러스터링과 그 활용	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	Sun Ah Kim	-
dc.description.degree	Doctor	-
dc.contributor.affiliation	사범대학 수학교육과	-
dc.date.awarded	2017-08	-

Appears in Collections:

College of Education (사범대학)
- Dept. of Mathematics Education (수학교육과)
  - Theses (Ph.D. / Sc.D._수학교육과)

Files in This Item:

000000146292.pdf 16.53 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share