Publications

Detailed Information

Sparse hypergraph clustering and its applications : 희박한 하이퍼그래프의 클러스터링과 그 활용

DC Field Value Language
dc.contributor.advisor유연주-
dc.contributor.author김선아-
dc.date.accessioned2017-10-27T16:56:49Z-
dc.date.available2017-10-27T16:56:49Z-
dc.date.issued2017-08-
dc.identifier.other000000146292-
dc.identifier.urihttps://hdl.handle.net/10371/136965-
dc.description학위논문 (박사)-- 서울대학교 대학원 사범대학 수학교육과, 2017. 8. 유연주.-
dc.description.abstractClustering is one of the most popular methods to extract meaningful patterns from data. In genomics, increasingly large amounts of DNA sequencing data are being generated. Developing effective clustering tools appropriate for each data structure is a major challenge. In this thesis, we develop the CLQ and CLQ-D clustering algorithm to partition SNP sequence data using theoretical graph-based approaches. Based on these clustering algorithms, the LD block construction method, Big-LD, is able to detect the LD blocks of SNP sequence data using interval graph modeling of the clustering results. A sparse graph is defined as a graph in which the actual number of edges is much less than the possible number of edges. Real-world data including biological, social, and internet network data can be modeled as sparse graphs. Due to the structural characteristic of SNP data, graph models constructed for the clustering algorithm and LD block construction algorithm have a sparse structure, which facilitates the efficient operation of the algorithm in terms of time and memory usage. The Big-LD algorithm detects LD blocks including "holes", which are not allowed in the previous methods. Based on the LD block structure constructed by the Big-LD algorithm, we investigated the relationships between big LD block structure and biological phenomena using the HapMap phase 3 data and phase 1 data of the 1000 Genomes Project. The LD block boundaries detected by the Big-LD algorithm coincided better with the recombination hotspots than previous methods. In addition, we demonstrate that the comparison of LD block structures can provide additional information about positive selection using the results applied to the candidate regions suggested by previous research. By generalizing the Big-LD algorithm, which is designed to partition SNP sequence data into blocks, we suggest four clustering algorithms—PSHSC, PSHRC, PSHSQ, and PSHRQ—for the sparse hypergraph partitioning problem. Simulation experiments demonstrated that the algorithms generate high-quality partitions in terms of global and local quality measures. The partitioning results closely agreed with the true underlying cluster structures of simulated hypergraphs. We also applied the developed algorithm to the problem of predicting protein complexes in yeast protein-protein interaction network data, and confirm its potential as a tool for clustering biological network datasets.-
dc.description.tableofcontentsContents
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives and contributions of the thesis . . . . . . . . . . . . . . 7 1.3 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . 9
II. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Basic graph theory . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Terminologies in genetics . . . . . . . . . . . . . . . . . . . . . . . 13
III. Clusteringofgenomicdata . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 CLQ algorithm . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 CLQ-D: construction of LD bins . . . . . . . . . . . . . . . 21 3.2.3 Big-LD: construction of LD blocks . . . . . . . . . . . . . 24 3.3 Evaluation of Big-LD algorithm . . . . . . . . . . . . . . . . . . . 32 3.3.1 Evaluation data . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Implementation and performance evaluation . . . . . . . . . 33 3.3.3 Comparisons of Big-LD block partition results with preexisting methods . . . . . . . . . . . . . . . . . . . . . . . 44 3.3.4 LD block and recombination hotspots . . . . . . . . . . . . 56 3.3.5 Multi-SNP association experiments using the results of different block partition methods . . . . . . . . . . . . . . . . 59
iii
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
IV. Comparisons of linkage disequilibrium blocks of different populationsforpositiveselection . . . . . . . . . . . . . . . . . . . . . . . . 72 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.1 Previous methods: XP-EHH and CMS tests . . . . . . . . . 74 4.2.2 Comparison measure of LD block partitions . . . . . . . . . 75 4.2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
V. Sparsehypergraphpartitioning . . . . . . . . . . . . . . . . . . . . 91 5.1 Motivation and background . . . . . . . . . . . . . . . . . . . . . . 91 5.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2.1 Multi-level hypergraph partitioning . . . . . . . . . . . . . 94 5.2.2 Spectral clustering . . . . . . . . . . . . . . . . . . . . . . 96 5.2.3 Dense subgraph partitioning . . . . . . . . . . . . . . . . . 98 5.2.4 k-clique clustering . . . . . . . . . . . . . . . . . . . . . . 98 5.3 Hypergraph partitioning algorithms . . . . . . . . . . . . . . . . . 100 5.3.1 Algorithm overview . . . . . . . . . . . . . . . . . . . . . 100 5.3.2 Construction of the line graph of a hypergraph . . . . . . . 101 5.3.3 Listing the dense sets of the line graph . . . . . . . . . . . . 103 5.3.4 Finding the MWIS of the intersection graph . . . . . . . . 109 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.4.2 Quality measures . . . . . . . . . . . . . . . . . . . . . . . 116
iv
5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5 Detecting protein complexes in PPI networks . . . . . . . . . . . . 133 5.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
I. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A.1 Coding correction algorithm . . . . . . . . . . . . . . . . . . . . . 146
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Abstract(inKorean) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
-
dc.formatapplication/pdf-
dc.format.extent17333188 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjecthypergraph-
dc.subjectclustering-
dc.subjectmaximum weight independent set-
dc.subjectclique-
dc.subjectlinkage disequilibrium-
dc.subjectLD block-
dc.subjectsingle nucleotide polymorphism-
dc.subjectpositive selection-
dc.subject.ddc510.7-
dc.titleSparse hypergraph clustering and its applications-
dc.title.alternative희박한 하이퍼그래프의 클러스터링과 그 활용-
dc.typeThesis-
dc.contributor.AlternativeAuthorSun Ah Kim-
dc.description.degreeDoctor-
dc.contributor.affiliation사범대학 수학교육과-
dc.date.awarded2017-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share