설명 가능한 모델을 활용한  적대적 예제 공격의 성능 향상에  대한 연구

전소희

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

설명 가능한 모델을 활용한 적대적 예제 공격의 성능 향상에 대한 연구 : Improving performance of adversarial example attack using model explanation

DC Field	Value	Language
dc.contributor.advisor	백윤흥	-
dc.contributor.author	전소희	-
dc.date.accessioned	2022-06-22T15:09:15Z	-
dc.date.available	2022-06-22T15:09:15Z	-
dc.date.issued	2022	-
dc.identifier.other	000000169536	-
dc.identifier.uri	https://hdl.handle.net/10371/183299	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000169536	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 백윤흥 .	-
dc.description.abstract	본 논문은 악의적으로 AI 모델의 분류 결과를 바꾸기 위해 데이터를 변조하는 adversarial example 공격 방법을 제안한다. 제안된 방법 XAI-W는 AI 모델의 출력 결과의 입력 데이터의 어느 부분이 중점적으로 영향을 미쳤는지 해석할 수 있는 model explanation 기술을 사용하여, 입력 데이터 중 AI 모델의 출력에 큰 영향을 미친 부분에 더 많은 노이즈를 준다. 제안된 방법은 공격자가 목표 AI 모델에 대해 모든 정보를 알 수 있는 white-box 환경과 출력 결과만을 알 수 있는 black-box 환경에서 모두 평가된다. White-box 환경에서는 제안된 방법을 사용하여 직접 대상 AI 모델을 공격하여 공격 성능을 평가한다. White-box 환경보다 정보가 제한된 black-box 환경에서는 model extraction 공격을 사용하여 목표 AI 모델에 대한 복제 AI 모델을 생성한다. 공격자는 생성된 복제 AI 모델에 대해 white-box 접근이 가능함으로 복제 AI 모델에 대한 정보를 활용하여 adversarial example을 생성한다. 이 때, 제안된 방법은 최종적으로 생성된 adversarial example을 통해 대상 AI 모델에 대한 adversarial example 공격을 수행하고 그 성능을 전이성 지표를 통해 평가한다. 대표적인 이미지 분야의 데이터셋 3가지에 대하여 제안된 방법의 성능을 평가하였다. 그 결과, 제안된 방법은 white-box 환경에서 99.9%~100%로 매우 높은 성능을 보였으며, black-box 환경에서는 목표 AI 모델에 대하여 널리 사용되는 기존 adversarial example 생성 방법인 PGD (Projected Gradient Descent)보다 최소 5% 최대 9.7% 높은 성능을 보였다. 제안된 방법은 AI 모델의 해석의 도구로 사용되는 model explanation을 공격에 효과적으로 활용할 수 있다는 것을 보여주었다.	-
dc.description.abstract	This paper proposed an adversarial example attack method that maliciously perturbs the data to change the classification result of the AI model. The proposed method, XAI-W, uses model explanation technology that can analyze which part of the input data is focused on the AI models classification. The proposed method is evaluated in both a white-box scenario where the attacker can access all information about the target AI model and a black-box scenario where only output results can be accessed. In the white-box scenario, attack performance is evaluated by directly attacking the target AI model using the proposed method. In the black-box scenario, which is limited and practical than the white-box scenario, a model extraction attack is used to construct a replica AI model of the target AI model. The attacker obtains model explanation of target AI model indirectly through replica AI model. As a result, the proposed method achieved high performance on the white-box scenario and up to 9.7% higher performance than the conventional adversarial example attack method on the black-box scenario.	-
dc.description.tableofcontents	제 1 장 서 론 1 제 2 장 배경지식 4 제 1 절 Model extraction 공격 4 제 2 절 Adversarial example 공격 7 제 3 절 Model explanation 10 제 3 장 제안 방법 14 제 1 디자인 및 설계 14 제 4 장 실험 19 제 1 절 실험 환경 19 제 2 절 실험 결과 21 제 5 장 결론 26 참고문헌 27 Abstract 30	-
dc.format.extent	iii, 30	-
dc.language.iso	kor	-
dc.publisher	서울대학교 대학원	-
dc.subject	Adversarial example attack	-
dc.subject	XAI	-
dc.subject	Model extraction attack	-
dc.subject.ddc	621.3	-
dc.title	설명 가능한 모델을 활용한 적대적 예제 공격의 성능 향상에 대한 연구	-
dc.title.alternative	Improving performance of adversarial example attack using model explanation	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	So-Hee Jun	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	석사	-
dc.date.awarded	2022-02	-
dc.identifier.uci	I804:11032-000000169536	-
dc.identifier.holdings	000000000047▲000000000054▲000000169536▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Files in This Item:

000000169536.pdf 5.79 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share