Publications

Detailed Information

Research on Hardware Vulnerability Detection and Data Protection in Deep Learning : 하드웨어 취약점 탐지 및 딥러닝에서 데이터 보호를 위한 연구

DC Field Value Language
dc.contributor.advisor이병영-
dc.contributor.author허재원-
dc.date.accessioned2023-11-20T04:22:34Z-
dc.date.available2023-11-20T04:22:34Z-
dc.date.issued2023-
dc.identifier.other000000178370-
dc.identifier.urihttps://hdl.handle.net/10371/196449-
dc.identifier.urihttps://dcollection.snu.ac.kr/common/orgView/000000178370ko_KR
dc.description학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2023. 8. 이병영.-
dc.description.abstract사이버 범죄로 인한 해킹, 데이터 탈취 등 위협이 점점 증가함에 따라 사이버 보안 분야가 더욱 많은 관심을 받고 있습니다. 이와 관련하여 다양한 보안 문제에 대한 지속적인 논의가 이루어지고 있습니다. 특히 본 논문에서는 시스템 보안과 관련된 두 가지 연구 주제인 하드웨어 취약점 탐지와 인공지능 학습에서의 학습 데이터 보호에 관해 논의합니다.

먼저, 시스템 보안을 달성하기 위해서는 하드웨어의 무결성이 필수적이므로, 본 논문에서 우선 하드웨어 취약점 탐지에 대한 연구를 소개합니다. 구체적으로, 우리는 두 가지, 취약점 탐지를 위한 프레임워크를 설계했습니다. 첫째로, CPU RTL에서 기능 버그를 찾는 DifuzzRTL(S&P 2021)과 둘째로, CPU에서 일시적 실행 취약점을 찾는 SpecDoctor(CCS 2022)입니다. DifuzzRTL은 CPU 버그를 찾기 위한 자동화된 효율적인 퍼징 프레임워크를 도입하였으며, 우리는 DifuzzRTL을 통해 오픈 소스 RISC-V CPU에서 새로운 버그를 발견하여 DifuzzRTL의 실용성을 입증했습니다. SpecDoctor는 CPU RTL을 기반으로 일시적 실행 취약점을 찾기 위한 첫 번째 퍼징 프레임워크로서, RISC-V CPU에서 새로운 유형의 일시적 실행 취약점을 발견하는 가능성을 보여주었습니다.

두 번째 주제로는 신뢰할 수 없는 인공지능 학습자에게 데이터를 공유할 때 학습 데이터 유출을 방지하는 방법에 대해 연구하였습니다. 위 연구는 특히, 현재 인공지능 산업에서 데이터가 가장 중요한 자산이라는 점에 착안하여 시작하였습니다. 이를 위해 FairLearning (S&P 2024에 제출함)이라는 딥러닝 학습 프레임워크를 설계했습니다. FairLearning은 딥러닝 모델 학습 중 데이터 유출의 현재 위험을 조명하고, 새로운 안전한 딥러닝 학습 구조를 제시합니다. 실제 데이터셋과 모델을 평가한 결과, FairLearning은 성능을 유지하면서 데이터 누출을 최소화하는 것을 보여주었습니다.
-
dc.description.abstractWith ever increasing threats from cyber-criminals, the concerns for cyber-security have increased more then ever before. In this respect, they have fostered ongoing discussions on various security issues. Especially, in this dissertation, we discuss two research topics in system security: vulnerability detection in hardware, and training data protection in deep learning.
First of all, we introduce the researches on detecting hardware vulnerabilities as the integrity of hardware is essential for achieving system security. More specifically, we design two frameworks for detecting the vulnerabilities: i) DifuzzRTL (S&P 2021), which finds functional bugs in CPU RTLs, and ii) SpecDoctor (CCS 2022), which finds transient execution vulnerabilities in CPUs. DifuzzRTL introduces an automatic and efficient fuzzing framework for finding CPU bugs, and we demonstrated the practicality of DifuzzRTL by finding new bugs in open-source RISC-V CPUs. SpecDoctor designs the first fuzzing framework to find transient execution vulnerabilities given CPU RTLs, and it shows the availability by finding new sorts of transient execution vulnerabilities in RISC-V CPUs.
As the second topic, we focus on how to protect against leaking the training data while sharing it to untrusted machine learners. We have studied on this issue because the data has become the foremost asset in current industry of artificial intelligence. To this end, we design FairLearning (w.t.b. accepted to S&P 2024), which is a deep learning framework that minimizes the training data leakages. FairLearning shed light on the current risks of data breaches in deep learning model training, and designs a new secure training. Through the evaluation on real-world datasets and models, FairLearning demonstrated that it minimizes the data leakage while preserving the performance.
-
dc.description.tableofcontentsAbstract 1

Contents 3

1. DifuzzRTL: Differential Fuzz Testing to Find CPU Bugs 1
1.1. Introduction 1
1.2. Background 5
1.3. Motivation 10
1.4. Design 17
1.5. Implementation 30
1.6. Evaluation 31
1.7. Limitation 48
1.8. Related Work 49
1.9. Conclusion 51

2. SpecDoctor: Differential Fuzz Testing to Find Transient Execution Vulnerabilities 53
2.1. Introduction 53
2.2. Background 57
2.3. Challenges in SpecDoctor 62
2.4. Design of SpecDoctor 65
2.5. Implementation 78
2.6. Evaluation 79
2.7. Findings of SpecDoctor 88
2.8. Discussions 94
2.9. Related Work 96
2.10. Conclusion 98

3. FairLearning: Protecting Training Data from Untrusted Machine Learners 99
3.1. Introduction 99
3.2. Background 103
3.3. Motivation 107
3.4. Theoretical Building Blocks 112
3.5. Design of FairLearning 123
3.6. Implementation 128
3.7. Evaluation 132
3.8. Discussion 142
3.9. Conclusion 144

Conclusion 145

Bibliography 146

초록 171

감사의 글 172


List of Tables

1.1. instrumentation overheads for synthetic RTL designs 35
1.2. p-values of the Mann-Whitney U test 36
1.3. Vargha Delaneys A12 measure 36
1.4. Instrumentation overhead for real-world CPU RTLs. 38
1.5. Runtime overheads of register-coverage and mux-coverage for real-world CPU RTLs. 38
1.6. Runtime overheads of register-coverage and mux-coverage for FPGA emulation. The numbers in the brackets are timing constraints for the FPGA bitstream. 42
1.7. A list of newly discovered bugs by DifuzzRTL. DifuzzRTL identified 16 bugs in total, all of those were confirmed by respective vendors. Five of those were assigned with CVE numbers. 44
1.8. Average time to find real-world bugs using each approach. 46
2.1. Transient executions in RISC-V Boom and NutShell discovered by SpecDoctor phase 2. rr: RoB rollback reason, op: opcode of triggering instruction, cpu: found cpu which contains the corresponding transient-trigger instructions (i.e., boom, nutshell, and both). 79
2.2. Timing-change coponents found by SpecDoctor in both RISC-V Boom and NutShell. The partially shows the entire results due to the space limit. 80
2.3. Static and dynamic overheads of SpecDoctor. Numbers in paranthesis show the overheads against the original. 82
2.4. Comparison study of finding transient execution vulnerabilities. All the attacks have accessed secret through a load instruction. 83
2.5. Variants of transient execution vulnerabilities found by SpecDoctor. 84
2.6. Comparison of SpecDoctor versus related works. 96
3.1. Evaluation tasks and datasets for each task. 132
3.2. Specification of the trained target models and the evaluation model for image classification. 135
3.3. BWleakage achieved in each scenario of image classification task. BWleakage of the attacks not possible in each scenario is set 0. All attacks were performed with the best set of parameters to maximize BWleakage. 136
3.4. Model specification for image segmentation task. 136
3.5. Achieved BWleakage in image segmentation task. 136

List of Figures

1.1. Framework of ISA simulation and RTL simulation 7
1.2. Workflow of coverage-guided fuzzing 9
1.3. Two independent FSMs of example memory controller 12
1.4. Schematics of example memory controller 12
1.5. Combined FSM of example memory controller 12
1.6. Workflow of RFuzzs mux coverage 15
1.7. Framework of DifuzzRTL 18
1.8. Input generated by mutator 19
1.9. Algorithm of control register identification. 24
1.10. Register-coverage instrumentation of DifuzzRTL 25
1.11. Workflow of DifuzzRTLs register-coverage 26
1.12. Average time(s) to find bug in synthetic RTL designs 35
1.13. Reached states in synthtetic RTL designs 35
1.14. Mux and register-coverage increments 39
1.15. Register-coverage increment in one iteration. 39
1.16. Register coverage in Mor1kx cappuccino (software simulation 40
1.17. Register coverage in Rocket core (software simulation) 40
1.18. Register coverage in Boom core (software simulation) 40
1.19. Frequency of registers bit values 41
1.20. Register coverage in Rocket core (FPGA emulation) 43
1.21. Register coverage Boom core (FPGA emulation) 43
2.1. Pipeline stages of CPU 58
2.2. CPU micro-architecture 58
2.3. General workflow of differential fuzz testing. 60
2.4. SpecDoctor fuzzer framework 64
2.5. Example attack scenarios on RISC-V ISA. 65
2.6. Configured memory layout and hardware protection. 65
2.7. Logic of DCache in RISC-V Boom CPU 71
2.8. Module containing four timing-change components. 72
2.9. Instrumentation logic 72
2.10. Phase 2 and 3 of SpecDoctor 74
2.11. Mechanism of triggering Boombard bug. 86
2.12. Code snippet of Boombard. 86
2.13. Code snippet of Birgus. 91
3.1. Model fitting procedure and the customizable components 102
3.2. Problem situation of FairLearning 106
3.3. Algorithm for model fitting. 114
3.4. Blackbox of Baseline 118
3.5. Blackbox of FairFunction 118
3.6. Overall framework of FairLearning 122
3.7. Model fitting code example 125
3.8. Constructed AST 125
3.9. Example code for FairLearning 129
3.10. Learning time for each task and model on Baseline, FL-unsafe, and FairLearning 138
3.11. Overhead of FairLearning for data augmentation. 139
3.12. Overhead of FairLearning for parameter update. 139
-
dc.format.extent189-
dc.language.isoeng-
dc.publisher서울대학교 대학원-
dc.subject하드웨어 취약점-
dc.subject퍼징-
dc.subject딥러닝-
dc.subject기밀계산-
dc.subject.ddc621.3-
dc.titleResearch on Hardware Vulnerability Detection and Data Protection in Deep Learning-
dc.title.alternative하드웨어 취약점 탐지 및 딥러닝에서 데이터 보호를 위한 연구-
dc.typeThesis-
dc.typeDissertation-
dc.contributor.AlternativeAuthorJaewon Hur-
dc.contributor.department공과대학 전기·정보공학부-
dc.description.degree박사-
dc.date.awarded2023-08-
dc.contributor.major컴퓨터 및 시스템 보안-
dc.identifier.uciI804:11032-000000178370-
dc.identifier.holdings000000000050▲000000000058▲000000178370▲-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share