Publications

Detailed Information

High Information Capacity and Low Cost DNA-based Data Storage through Additional Encoding Characters : 인코딩 문자 추가를 통한 고효율 저가격 DNA 기반 정보 저장법에 관한 연구

DC Field Value Language
dc.contributor.advisor권성훈-
dc.contributor.author최영재-
dc.date.accessioned2018-11-12T00:57:14Z-
dc.date.available2019-06-05-
dc.date.issued2018-08-
dc.identifier.other000000151891-
dc.identifier.urihttps://hdl.handle.net/10371/143143-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 8. 권성훈.-
dc.description.abstractStoring digital data in DNA is the process of encoding digital data into DNA sequences, synthesizing and storing these. Recently, the platform has been emerged with the possibility to supplement the current backup data storage with infrequent access, due to its physical advantages compared to conventional storage media. First, DNA can be maintained for centuries, which is in contrast to conventional storage media that require power supply or be rewritten for data retention. Second, DNA has physical information density that can store hundreds of petabytes (PB, 1015 bytes) per gram, thousands of times higher than conventional storage method. The major goal of previous research on DNA-based data storage was to improve data encoding algorithms for reducing data error or loss. Design rules for Data to DNA encoding and error correction functions were suggested.

The next step towards DNA-based data storage is to reduce the cost for storing the data and enable the practical use. Current cost for DNA-based data storage is about 3500 USD per storing 1 MB of data storage. As a first step to practical implementation, this dissertation shows the possibility of reducing the cost of DNA- based data storage by 50% by increasing the amount of data that can be stored per synthesized DNA, i.e., the information capacity, above the previous theoretical maximum. The proposed idea is to use degenerate bases, which are mixes of the four encoding nucleotides, as additional encoding characters with the DNA encoding characters A, C, G and T. I propose a completely novel approach utilizing a synthetic process, whereas the existing studies were algorithmic optimizations and simple demonstrations.

Using the proposed idea, I demonstrated and simulated the total process of the DNA-based data storage, including Data to DNA encoding, molecular biology-based DNA handling and DNA sequence to Data decoding. From this, the theoretical maximum information capacity, which is equivalent to log2 value of the number of encoding characters, is increased from log24 to log215 (bit/nt) by adding 11 degenerate bases to the original four encoding characters. The DNA length required for storing data was experimentally reduced by more than half compared to that of the 4 character-based system. Also, from the simulation and cost projection, the cost of storing 1 MB is projected to be reduced by 50% compared to the previous cost. The data writing or DNA synthesis cost is decreased because the length of DNA required to store data is reduced to less than half.

Since the method only needs minor modifications of the encoding and DNA synthesizing processes, it can be applied to nearly all proposed DNA-based data storage methodologies and could increase the economic efficiency. Therefore, it is expected that the proposed idea and the demonstration could be utilized for practical implementation of DNA-based data storage.
-
dc.description.tableofcontentsABSTRACT I

TABLE OF CONTENTS IV

LIST OF FIGURES VII

LIST OF TABLES XII

CHAPTER 1. INTRODUCTION 1

1.1. Increasing Demand for Data Storage 2

1.2. DNA-based Data Storage 4

1.2.1. DNA as the Nature's Data Storage Medium 4

1.2.2. DNA-based Data Storage 5

1.2.3. Information Capacity of DNA-based Data Storage 9

1.3. Main Concept: Addition of Degenerate Bases to DNA-based Data Storage for Higher Information Capacity 10

1.4. Outline of the Dissertation 12

CHAPTER 2. BACKGROUND OF THE DISSERTATION 13

2.1. Previous DNA-based Data Storage Methods 14

2.1.1. The Nature of DNA to be Considered as Storage Media 14

2.1.2. Data to DNA Encoding Algorithms 16

2.1.3. Error Correcting Methods for DNA-based Data storage 18

2.1.4. Comparison of DNA Storage Encoding Schemes and Experimental Results 22

2.1.5. Comparison of Cost of DNA-based Data Storage Methods 24

2.2. Addition of Encoding characters for Higher Information Capacity 26

2.2.1. Degenerate Base 28

CHAPTER 3. ADDITION OF DEGENERATE BASES TO DNA-BASED DATA STORAGE 31

3.1. Digital Data to DNA Encoding Method 32

3.1.1. Design of the DNA library for storage 33

3.2. Amplification and Sequencing of DNA library 37

3.3. Decoding of the Data from the Sequencing Data 38

3.3.1. Determination of Degenerate Base 39

3.3.2. Decoding Result and Down-sampling of Sequencing Data 42

3.4. Microarray-derived DNA Pool Based DNA-based Data Storage 48

3.4.1. Design and experiment of the DNA library for storage 49

3.4.2. Experimental Result and PCR bias analysis 53

CHAPTER 4. SIMULATION APPROACH FOR ERROR RATE ANALYSIS AND COST PROJECTION OF PLATFORM IN SCALED-UP DATA STORAGE 61

4.1. Monte-Carlo Simulation for Error Rate Analysis 62

CHAPTER 5. CONCLUSION AND DISCUSSION 72

5.1. Comparison of the Result with Previous Works 73

5.2. Cost Projection of the Platform 76

5.2.1. Outlook for Practical Use of DNA-based Data Storage 78

5.3. Applicability of Degenerate Bases to Other DNA-based Data Storage Methods 80

5.4. Future Works 81

5.4.1. Clustering of NGS Read for Shorter Fragment Decoding 83

5.4.2. Addition of Inosine Base for DNA-based Data Storage 84

5.4.3. Indexing of DNA on Encoded Microparticle 85

CHAPTER 6. BIBLIOGRAPHY 89

CHAPTER 7. 국문 초록 93
-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject.ddc621.3-
dc.titleHigh Information Capacity and Low Cost DNA-based Data Storage through Additional Encoding Characters-
dc.title.alternative인코딩 문자 추가를 통한 고효율 저가격 DNA 기반 정보 저장법에 관한 연구-
dc.typeThesis-
dc.contributor.AlternativeAuthorYeongjae Choi-
dc.description.degreeDoctor-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2018-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share