S-Space College of Natural Sciences (자연과학대학) Dept. of Mathematical Sciences (수리과학부) Journal Papers (저널논문_수리과학부)
Private genome analysis through homomorphic encryption
- Kim, Miran; Lauter, Kristin
- Issue Date
- BioMed Central
- BMC Medical Informatics and Decision Making, 15(Suppl 5):S3
- This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided
the original work is properly cited.
The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data.
We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes - the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig.
The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them.
The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study.