High Information Capacity and Low Cost DNA-based Data Storage through Additional Encoding Characters
인코딩 문자 추가를 통한 고효율 저가격 DNA 기반 정보 저장법에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
공과대학 전기·컴퓨터공학부
Issue Date
서울대학교 대학원
학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2018. 8. 권성훈.
Storing digital data in DNA is the process of encoding digital data into DNA sequences, synthesizing and storing these. Recently, the platform has been emerged with the possibility to supplement the current backup data storage with infrequent access, due to its physical advantages compared to conventional storage media. First, DNA can be maintained for centuries, which is in contrast to conventional storage media that require power supply or be rewritten for data retention. Second, DNA has physical information density that can store hundreds of petabytes (PB, 1015 bytes) per gram, thousands of times higher than conventional storage method. The major goal of previous research on DNA-based data storage was to improve data encoding algorithms for reducing data error or loss. Design rules for Data to DNA encoding and error correction functions were suggested.
The next step towards DNA-based data storage is to reduce the cost for storing the data and enable the practical use. Current cost for DNA-based data storage is about 3500 USD per storing 1 MB of data storage. As a first step to practical implementation, this dissertation shows the possibility of reducing the cost of DNA- based data storage by 50% by increasing the amount of data that can be stored per synthesized DNA, i.e., the information capacity, above the previous theoretical maximum. The proposed idea is to use degenerate bases, which are mixes of the four encoding nucleotides, as additional encoding characters with the DNA encoding characters A, C, G and T. I propose a completely novel approach utilizing a synthetic process, whereas the existing studies were algorithmic optimizations and simple demonstrations.
Using the proposed idea, I demonstrated and simulated the total process of the DNA-based data storage, including Data to DNA encoding, molecular biology-based DNA handling and DNA sequence to Data decoding. From this, the theoretical maximum information capacity, which is equivalent to log2 value of the number of encoding characters, is increased from log24 to log215 (bit/nt) by adding 11 degenerate bases to the original four encoding characters. The DNA length required for storing data was experimentally reduced by more than half compared to that of the 4 character-based system. Also, from the simulation and cost projection, the cost of storing 1 MB is projected to be reduced by 50% compared to the previous cost. The data writing or DNA synthesis cost is decreased because the length of DNA required to store data is reduced to less than half.
Since the method only needs minor modifications of the encoding and DNA synthesizing processes, it can be applied to nearly all proposed DNA-based data storage methodologies and could increase the economic efficiency. Therefore, it is expected that the proposed idea and the demonstration could be utilized for practical implementation of DNA-based data storage.
Files in This Item:
Appears in Collections:
College of Engineering/Engineering Practice School (공과대학/대학원)Dept. of Electrical and Computer Engineering (전기·정보공학부)Theses (Ph.D. / Sc.D._전기·정보공학부)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.