Generative Adversarial Trainer Defense to Adversarial Perturbations with GAN

Abstract: 최근의 딥 러닝 기술의 놀라운 발전에도 불구하고 네트워크 신경망은 여전히 악의적인 공격에 취약하다. 악의적인 이미지를 훈련 데이터로 사용하여 이러한 문제를 해결하기 위한 많은 적대적 훈련 방법이 도입되었다. 하지만 이러한 기술에 사용되는 악의적인 이미지를 생성하는 방법은 항상 고정되어 있기 때문에, 네트워크 신경망은 이러한 공격에 대해서만 모델을 더 강인하게 만드는 과적합 학습을 수행하게 된다.
이 논문의 첫번째 파트에서는 생성적 적대 신경망을 활용한 새로운 방식의 적대적 훈련 방법이 제안되었다. 우리가 제안하는 방법은 분류기 네트워크 이외에 추가적으로 분류기의 약점을 찾아서 가장 효과적인 섭동을 생성하는 또 하나의 신경망을 추가하는 것이다. 이 생성기 네트워크는 분류기의 손실을 최대화하는 섭동을 생성하도록 훈련되며, 분류기 네트워크는 생성기 네트워크로부터 생성된 악의적인 이미지를 실제 레이블로 분류하도록 훈련된다. 즉, 두 네트워크는 서로 경쟁하며 최대-최소화 게임 (minimax game)을 수행하게 된다. 이러한 시나리오에서 생성기 네트워크는 분류기 네트워크의 약점을 실시간으로 찾아서 섭동을 생성하기 때문에 위에서 설명한 과적합 문제를 완화할 수 있다. 우리는 이론적으로 위 최적화 문제가 결국 적대적 손실 (adversarial loss)를 최소화하는 것과 같다는 것을 증명하였다. 다양한 데이터 세트를 사용한 실험 결과 우리의 방법이 기존의 적대적 훈련 알고리즘보다 우수한 것으로 밝혀졌다.
이 논문의 두 번째 파트에서는 심층 신경망 기반의 새로운 협업 필터링 알고리즘을 제시한다. 협업 필터링 기술은 기존까지는 행렬 분해, 최근접 이웃 알고리즘 등 전통적인 기계 학습 알고리즘이 사용된 반면, 본 논문에서는 심층 신경망을 본격적으로 활용하였다. 정규화 된 사용자 평가 벡터와 사용자 아이템 벡터가 심층 신경망의 입력으로 사용되었고, 실험 결과는 제안하는 방법이 전통적인 협업 필터링 기술에 비하여 매우 뛰어난 효과가 있음을 보여준다. 또한, 계산 복잡도를 획기적으로 줄이고, 성능저하가 거의 없는 실시간 예측 및 학습이 가능하다는 장점을 가지고 있다. 뿐만 아니라 제안하는 협업 필터링 알고리즘이 적대적인 훈련 알고리즘을 통하여 보다 적대적인 노이즈에 강인해질 수 있음을 실험적으로 확인하였다. 이러한 연구 결과들은 향후 인공지능 기술이 산업에 응용됨에 있어 매우 중요한 보안 문제들을 해결해 줄 것으로 기대된다.
Despite the remarkable development of recent deep learning techniques, neural networks are still vulnerable to adversarial attacks. Many adversarial training methods were introduced as to solve this problem, using adversarial examples as a training data. However, these adversarial attack methods used in these techniques are fixed, making the model stronger only to attacks used in training, which is widely known as an overfitting problem.
In the first part of this dissertation, I suggest a novel adversarial training approach. In addition to the classifier, our method adds another neural network that generates the most effective adversarial perturbation by finding the weakness of the classifier. This perturbation generator network is trained to produce perturbations that maximize the loss function of the classifier, and these adversarial examples train the classifier with a true label. In short, the two networks compete each other, performing a minimax game. In this scenario, attack patterns created by the generator network are adaptively altered to the classifier, mitigating the overfitting problem mentioned above. I theoretically proved that our minimax optimization problem is equivalent to minimizing the adversarial loss after all. I proposed a new evaluation method that can fairly measure the robustness of the network. Experiments with various datasets show that our method outperforms conventional adversarial training algorithms.
In the second part of this dissertation, I propose a novel collaborative filtering algorithm based on deep neural networks, whereas existing collaborative filtering techniques use conventional machine learning algorithms such as baseline predictor, matrix factorization, KNN, etc. Normalized user-rating vector and normalized item-rating vector were used as inputs to a neural network. Experimental results show that the proposed method outperforms conventional collaborative filtering algorithms. The proposed method has another strong advantage that online operation is possible with little extra complexity and performance degradation.
The results of these studies are expected to solve very important security problems when artificial intelligence technology is applied to the industry in the future.

Language: eng

URI: https://hdl.handle.net/10371/169311

http://dcollection.snu.ac.kr/common/orgView/000000163306

Files in This Item:

000000163306.pdf 5.10 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share