Semantics-Preserving Adversarial Training

Abstract: Adversarial training is a defense technique that improves adversarial robustness of a deep neural network (DNN) by including adversarial examples in the training data. In this paper, we identify an overlooked problem of adversarial training in that these adversarial examples often have different semantics than the original data, introducing unintended biases into the model. We hypothesize that such non-semantics-preserving (and resultingly ambiguous) adversarial data harm the robustness of the target models. To mitigate such unintended semantic changes of adversarial examples, we propose semantics-preserving adversarial
training (SPAT) which encourages perturbation on the pixels that are shared among all classes when generating adversarial examples in the training stage. Experiment results show that SPAT improves adversarial robustness and achieves state-of-the-art results in CIFAR-10, CIFAR-100, and STL-10.
적대적 학습은 적대적 예제를 학습 데이터에 포함시킴으로써 심층 신경망의 적대적 강건성을 개선하는 방어 방법이다. 이 논문에서는 적대적 예제들이 원본 데이터와는 때때로 다른 의미를 가지며, 모델에 의도하지 않은 편향을 집어 넣는다는 기존에는 간과되어왔던 적대적 학습의 문제를 밝힌다. 우리는 이러한 의미를 보존하지 않는, 그리고 결과적으로 애매모호한 적대적 데이터가 목표 모델의 강건성을 해친다고 가설을 세웠다. 우리는 이러한 적대적 예제들의 의도하지 않은 의미적 변화를 완화하기 위해, 학습 단계에서 적대적 예제들을 생성할 때 모든 클래스들에게서 공유되는 픽셀에 교란하도록 권장하는, 의미 보존 적대적 학습을 제안한다. 실험 결과는 의미 보존 적대적 학습이 적대적 강건성을 개선하며, CIFAR-10과 CIFAR-100과 STL-10에서 최고의 성능을 달성함을 보인다.

Language: eng

URI: https://hdl.handle.net/10371/175428

https://dcollection.snu.ac.kr/common/orgView/000000163658

Files in This Item:

000000163658.pdf 3.03 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share