Publications

Detailed Information

Efficient Genome Wide Tagging by Reduction to SAT

Cited 3 time in Web of Science Cited 4 time in Scopus
Authors

Choi, Arthur; Zaitlen, Noah; Han, Buhm; Pipatsrisawat, Knot; Darwiche, Adnan; Eskin, Eleazar

Issue Date
2008-09
Publisher
Springer Verlag
Citation
Lecture Notes in Computer Science, Vol.5251, pp.135-147
Abstract
Whole genome association has recently demonstrated some remarkable successes in identifying loci involved in disease. Designing these studies involves selecting a subset of known single nucleotide polymorphisms (SNPs) or tag SNPs to be genotyped. The problem of choosing tag SNPs is an active area of research and is usually formulated such that the goal is to select the fewest number of tag SNPs which "cover" the remaining SNPs where "cover" is defined by some statistical criterion. Since the standard formulation of the tag SNP selection problem is NP-hard, most algorithms for selecting tag SNPs are either heuristics which do not guarantee selection of the minimal set of tag SNPs or are exhaustive algorithms which are computationally impractical. In this paper, we present a set of methods which guarantee discovering the minimal set of tag SNPs, yet in practice are much faster than traditional exhaustive algorithms. We demonstrate that our methods can be applied to discover minimal tag sets for the entire human genome. Our method converts the instance of the tag SNP selection problem to an instance of the satisfiability problem, encoding the instance into conjunctive normal form (CNF). We take advantage of the local structure inherent in human variation, as well as progress in knowledge compilation, and convert our CNF encoding into a representation known as DNNF, from which solutions to our original problem can be easily enumerated. We demonstrate our methods by constructing the optimal tag set for the whole genome and show that we significantly outperform previous exhaustive search-based methods. We also present optimal solutions for the problem of selecting multi-marker tags in which some SNPs are "covered" by a pair of tag SNPs. Multi-marker tags can significantly decrease the number of tags we need to select, however discovering the minimal number of multi-marker tags is much more difficult. We evaluate our methods and perform benchmark comparisons to other methods by choosing tag sets using the HapMap data.
ISSN
0302-9743
URI
https://hdl.handle.net/10371/191661
DOI
https://doi.org/10.1007/978-3-540-87361-7_12
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Related Researcher

  • College of Medicine
  • Department of Medicine
Research Area Bioinformatics, Computational Biology, Genomics, Human Leukocyte Antigen, Statistical Genetics

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share