Improved Fully Homomorphic Encryption and Practical Implementation of Privacy-Preserving Deep Neural Networks

Abstract: 완전 동형 암호는 서버가 클라이언트의 암호화된 데이터를 복호화하지 않고 임의의 연산을 할 수 있도록 하는 암호화 알고리즘이다. 이를 통해 클라이언트는 데이터에 대한 정보를 유출하지 않고 중요한 데이터를 신뢰하기 어려운 서버에 가공을 위탁할 수 있다. 2009년에 Gentry가 완전동형암호의 첫 번째 설계를 성공한 이후 여러 종류의 완전동형암호 알고리즘이 제안되고 다양한 측면에서 개선되어 왔지만, 현재의 완전동형암호 알고리즘은 실생활에 적용하기에 효율적이지 않다. 본 연구에서는 완전동형암호 알고리즘은 실제 적용에 적합하게 만들기 위해 완전동형암호 알고리즘에 대한 네 가지 주요 핵심 주제인 근사 동형 암호의 부트스트래핑, 갈루아 키 관리, 정보 보호 심층 신경망, FHE 상에서의 대한 정렬 개선을 다룬다.

첫째, 본 연구에서는 RNS-CKKS 알고리즘의 부트스트래핑 연산의 메시지 정밀도를 향상시킨다. 동형 나머지 연산 과정은 부트스트래핑의 정밀도를 결정하는 가장 중요한 단계 중 하나이기 때문에, 우리는 동형 나머지 연산 과정에 초점을 맞춘다. 우리는 먼저 함수의 최적 근사 다항식과 스케일링된 사인/코사인 함수를 얻는 개선된 다중 간격 Remez 알고리즘을 제안한다. 다음으로, 우리는 부트스트래핑에 사용되는 스케일링 상수와 기본 스케일링 상수의 차이를 줄이기 위해 역사인 함수를 사용하는 함수 합성 방법을 제안한다. 이러한 방법을 사용하여 각 매개 변수 설정에 대해 RNS-CKKS 동형암호의 부트스트래핑 근사 오류를 1/1176~1/42(5.4~10.2비트 정밀도 향상) 만큼 줄일 수 있음을 보인다. 함수 합성을 사용하지 않는 부트스트래핑의 정확도는 최대 27.2~30.3비트를 가지는 데에 반해, 함수 합성을 사용하는 부트스트래핑의 정확도는 최대 32.6~40.5비트를 가짐을 확인한다.

둘째, 완전동형암호 알고리즘과 관련된 최첨단 기술을 사용하여 모델을 수정하거나 재훈련 과정을 거치지 않고도 평문 상의 모델과 거의 동일한 분류 정확도를 가진 CKKS 동형암호 상의 정보 보호 ResNet-20 네트워크를 구현한다. 여기서 성능을 더욱 향상시키기 위해 여러 채널에 대한 희소 출력 데이터를 조밀하게 구성하는 멀티플렉스 병렬 컨볼루션을 사용하여 총 부트스트래핑의 수행 시간을 최소화한다. 우리는 또한 근사 ReLU 연산 과정 동안 심층 신경망의 극심한 발산 현상을 방지하는 허수부 노이즈 제거 부트스트래핑을 제안한다. 또한 레벨 소비를 최적화하고 더 가볍고 최적화된 매개 변수를 사용한다. 시뮬레이션을 통해 기술의 효율성을 보인 결과, 멀티플렉스 병렬 컨볼루션 기술이 없는 구현에 비해 ResNet-20의 추론 수행 시간이 4.67 배 적고 환산 수행 시간 (이미지당 수행 시간)이 134 배 적으며 표준 128비트 보안 수준을 달성한다는 것을 보여준다. 또한 CIFAR-10 데이터셋에 대한 ResNet-32, 44, 56, 110 모델과 CIFAR-100 데이터셋에 대한 ResNet-32를 성공적으로 구현했으며 분류 정확도는 평문 상의 정확도와 유사하다. 최대 분류 정확도는 92.9%로, 유사한 정보 보호 인공지능 모델 중 가장 높은 정확도이다.

셋째, BFV 및 CKKS 동형암호를 사용하는 클라이언트와 서버의 부담을 줄이기 위해 동형 암호 알고리즘과 관련된 새로운 개념인 계층적 갈루아 키 생성 방법이 제안된다. 제안된 방법의 주요 개념은 계층적 갈루아 키이며, 클라이언트가 소수의 높은 키 레벨의 갈루아 키를 생성하고 서버로 전송하면, 서버는 공개키와 소수의 높은 키 레벨의 갈루아 키 집합에서 필요한 갈루아 키를 생성할 수 있다. 이 제안된 방법은 갈루아 키 생성을 위한 클라이언트의 연산량과 갈루아 키 전송을 위한 통신 비용을 크게 줄인다. 예를 들어, CIFAR-10 데이터셋을 위한 표준 ResNet-20 네트워크와 ImageNet 데이터셋을 위한 ResNet-18 네트워크가 각각 N=2^16와 N=2^17를 갖는 CKKS 동형암호 상에서 사전 훈련된 매개 변수로 구현되는 경우, 서버는 각각 265개 및 617개의 갈루아 키를 필요로 하는데, 이 키의 총 용량은 105.6GB 및 197.6GB이다. 또한 3단계 계층적 갈루아 키 시스템을 사용할 경우 클라이언트에 의해 생성되고 전송되는 갈루아 키 용량을 CIFAR-10 데이터셋을 위한 ResNet-20 네트워크에 대해서는 105.6GB에서 3.4GB로 줄일 수 있고, ImageNet 데이터셋을 위한 ResNet-18 네트워크에 대해서는 197.6GB에서 3.9GB로 줄일 수 있다.

넷째, 완전동형암호로 암호화된 데이터에 셸 정렬 알고리즘을 사용할 수 있도록, 무시할만큼 작은 정렬 실패 확률을 허용하고 추가 매개 변수 $\alpha$와 도입하며 수정한다. 간격 수열을 2의 승수로 구성하는 경우, 입력 배열 길이가 $n$인 수정된 셸 정렬은 실행 시간 복잡도 $O(n^{3/2}\sqrt{\alpha+\log\log n})$와 정렬 실패 확률 $2^{-\alpha}$의 트레이드오프를 갖는 것으로 확인된다. 그것의 시간 복잡도는 평문 상의 실행 시간 복잡도 $O(n^{3/2})$에 가깝고, 실행 시간을 약간 증가시키면서 정렬 실패 확률을 지수적으로 작게 만들 수 있다. 또한 수정된 셸 정렬의 최적 창 길이도 볼록함수 최적화를 통해 근사적으로 도출된다. 수정된 셸 정렬에 대한 수치적인 분석을 하기 위해 무작위로 생성된 배열을 사용하여 검증한다. 수치적인 분석에서는 어떠한 간격 수열에도 수정 알고리즘을 적용할 수 있으며, 실제 성능이 좋은 것으로 알려진 Ciura의 간격 수열을 사용하여 수정된 Shell 정렬을 수행했을 때 실질적으로 효과적이라는 것을 알 수 있다. 수정된 셸 정렬은 TFHE 라이브러리를 통해 FHE 상에서 수행될 수 있는 다른 정렬 알고리즘과 비교한 결과, 수정된 셸 정렬은 동형 암호 상에서 추가 메모리가 필요하지 않는 정렬 알고리즘 중 실행 시간 측면에서 최고의 성능을 갖는 것으로 나타났다.
Fully homomorphic encryption (FHE) is an encryption scheme that enables the server to process encrypted data sent by the clients without decrypting them in the client-server model. It allows the clients to outsource processing the sensitive data to the distrustful server without leaking any information about the data. Although several FHE schemes are constructed and improved in various aspects since the first construction of FHE in 2009, the existing FHE schemes are not sufficiently efficient for being applied to practical application. In this dissertation, four main crucial topics on FHE schemes are dealt with to make FHE schemes appropriate for practical applications: Bootstrapping of approximate homomorphic encryption, Galois key management, privacy-preserving deep neural network, and sorting for FHE'ed data.

First, the message precision in the bootstrapping operation of the residue number system variant Cheon-Kim-Kim-Song (RNS-CKKS) scheme is improved. Since the homomorphic modular reduction process is one of the most important steps in determining the precision of the bootstrapping, the homomorphic modular reduction process is focused on. I propose an improved multi-interval Remez algorithm obtaining the optimal minimax approximate polynomial of modular reduction function and the scaled sine/cosine function over the union of the approximation regions. Next, I propose the composite function method using the inverse sine function to reduce the difference between the scaling factor used in the bootstrapping and the default scaling factor. With these methods, the approximation error in the bootstrapping of the RNS-CKKS scheme is reduced by 1/1176~1/42 (5.4~10.2-bit precision improvement) for each parameter setting. While the bootstrapping without the composite function method has 27.2~30.3-bit precision at maximum, the bootstrapping with the composite function method has 32.6~40.5-bit precision.

Second, using the state-of-the-art techniques regarding FHE schemes, the ResNet-20 network with the CKKS scheme is implemented with having almost the same classification accuracy as the plaintext counterpart without any modification of models or retraining. To further improve the performance, the total bootstrapping runtime is minimized using multiplexed parallel convolution that collects sparse output data for multiple channels compactly. The imaginary-removing bootstrapping is also proposed to prevent the deep neural networks from catastrophic divergence during approximate ReLU operations. In addition, level consumptions are optimized with lighter and tighter parameters. Simulation results show that the proposed implementation has 4.67x lower inference latency and 134x less amortized runtime (runtime per image) for ResNet-20 compared to the implementation without the multiplex convolution technique, with achieving standard 128-bit security. Furthermore, ResNet-32, 44, 56, 110 for CIFAR-10 dataset and ResNet-32 for CIFAR-100 dataset are successfully implemeneted, and the classification accuracy is similar to that of plaintext counterparts. The maximum classification accuracy is 92.9% at maximum, which is the highest accuracy among the similar privacy-preserving setting.

Third, a new concept of hierarchical Galois key generation method for homomorphic encryption is proposed to reduce the burdens of the clients and the server running BFV and CKKS schemes. The main concept in the proposed method is the hierarchical Galois keys, such that after the client generates and transmits a few Galois keys in the highest key level to the server, the server can generate any required Galois keys from the public key and the smaller set of Galois keys in the higher key level. This proposed method significantly reduces the number of the clients' operations for Galois key generation and the communication cost for the Galois key transmission. For example, if the standard ResNet-20 network for the CIFAR-10 dataset and the ResNet-18 network for the ImageNet dataset are implemented with pre-trained parameters of the CKKS scheme with the polynomial modulus degree N=2^16 and N=2^17, respectively, the server requires 265 and 617 Galois keys, which occupy 105.6GB and 197.6GB of memory, respectively. If the proposed three-level hierarchical Galois key system is used, the Galois key size generated and transmitted by the client can be reduced from 105.6GB to 3.4GB for ResNet-20 model for CIFAR-10, and reduced from 197.6GB to 3.9GB for ResNet-18 model for ImageNet.

Fourth, in order for the Shell sort algorithm to be used on the FHE data, the Shell sort is modified with an additional parameter $\alpha$, allowing exponentially small sorting failure probability. For a gap sequence of powers of two, the modified Shell sort with input array length $n$ is found to have the trade-off between the running time complexity of $O(n^{3/2}\sqrt{\alpha+\log\log n})$ and the sorting failure probability of $2^{-\alpha}$. Its running time complexity is close to the intended running time complexity of $O(n^{3/2})$ and the sorting failure probability can be made very low with slightly increased running time. Further, the near-optimal window length of the modified Shell sort is also derived via convex optimization. The proposed analysis of the modified Shell sort is numerically confirmed by using randomly generated arrays. For the practical aspect, the modification can be applied to any gap sequence, and it is shown that Ciura's gap sequence, which is known to have good practical performance, is also practically effective when the modified Shell sort is applied. The modified Shell sort is compared with other sorting algorithms with the FHE over the torus (TFHE) library, and it is shown that this modified Shell sort has the best performance in running time among in-place sorting algorithms on homomorphic encryption scheme.

Language: eng

URI: https://hdl.handle.net/10371/187728

https://dcollection.snu.ac.kr/common/orgView/000000172433

Files in This Item:

000000172433.pdf 10.77 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share