설명 가능한 모델을 활용한  적대적 예제 공격의 성능 향상에  대한 연구

전소희

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

설명 가능한 모델을 활용한 적대적 예제 공격의 성능 향상에 대한 연구 : Improving performance of adversarial example attack using model explanation

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 전소희

Advisor: 백윤흥

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: Adversarial example attack ; XAI ; Model extraction attack

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 백윤흥 .

Abstract: 본 논문은 악의적으로 AI 모델의 분류 결과를 바꾸기 위해 데이터를 변조하는 adversarial example 공격 방법을 제안한다. 제안된 방법 XAI-W는 AI 모델의 출력 결과의 입력 데이터의 어느 부분이 중점적으로 영향을 미쳤는지 해석할 수 있는 model explanation 기술을 사용하여, 입력 데이터 중 AI 모델의 출력에 큰 영향을 미친 부분에 더 많은 노이즈를 준다. 제안된 방법은 공격자가 목표 AI 모델에 대해 모든 정보를 알 수 있는 white-box 환경과 출력 결과만을 알 수 있는 black-box 환경에서 모두 평가된다. White-box 환경에서는 제안된 방법을 사용하여 직접 대상 AI 모델을 공격하여 공격 성능을 평가한다. White-box 환경보다 정보가 제한된 black-box 환경에서는 model extraction 공격을 사용하여 목표 AI 모델에 대한 복제 AI 모델을 생성한다. 공격자는 생성된 복제 AI 모델에 대해 white-box 접근이 가능함으로 복제 AI 모델에 대한 정보를 활용하여 adversarial example을 생성한다. 이 때, 제안된 방법은 최종적으로 생성된 adversarial example을 통해 대상 AI 모델에 대한 adversarial example 공격을 수행하고 그 성능을 전이성 지표를 통해 평가한다. 대표적인 이미지 분야의 데이터셋 3가지에 대하여 제안된 방법의 성능을 평가하였다. 그 결과, 제안된 방법은 white-box 환경에서 99.9%~100%로 매우 높은 성능을 보였으며, black-box 환경에서는 목표 AI 모델에 대하여 널리 사용되는 기존 adversarial example 생성 방법인 PGD (Projected Gradient Descent)보다 최소 5% 최대 9.7% 높은 성능을 보였다. 제안된 방법은 AI 모델의 해석의 도구로 사용되는 model explanation을 공격에 효과적으로 활용할 수 있다는 것을 보여주었다.
This paper proposed an adversarial example attack method that maliciously perturbs the data to change the classification result of the AI model. The proposed method, XAI-W, uses model explanation technology that can analyze which part of the input data is focused on the AI models classification. The proposed method is evaluated in both a white-box scenario where the attacker can access all information about the target AI model and a black-box scenario where only output results can be accessed. In the white-box scenario, attack performance is evaluated by directly attacking the target AI model using the proposed method. In the black-box scenario, which is limited and practical than the white-box scenario, a model extraction attack is used to construct a replica AI model of the target AI model. The attacker obtains model explanation of target AI model indirectly through replica AI model. As a result, the proposed method achieved high performance on the white-box scenario and up to 9.7% higher performance than the conventional adversarial example attack method on the black-box scenario.

Language: kor

URI: https://hdl.handle.net/10371/183299

https://dcollection.snu.ac.kr/common/orgView/000000169536

Files in This Item:

000000169536.pdf 5.79 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share