Parameter-Efficient Knowledge Distillation on Transformer

전효진

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Parameter-Efficient Knowledge Distillation on Transformer : 파라미터 효율적인 트랜스포머 지식 증류

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 전효진

Advisor: 강유

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: Model compression ; Transformer ; Knowledge Distillation

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2023. 2. 강유.

Abstract: How can we obtain a small and computationally efficient transformer model, maintaining the performance of a large model? Transformers have shown significant performance in recent years. However, their large model size, expensive computation cost, and long inference time prohibit them to be deployed on resource-restricted devices. Existing transformer compression methods have mainly focused on only reducing an encoder although a decoder takes up most of their long inference time. In this paper, we propose PET (Parameter-Efficient Knowledge Distillation on Transformer), an efficient transformer compression method reducing the size of both the encoder and decoder. PET improves the knowledge distillation of the Transformer, designing an efficient compressed structure of both the encoder and decoder and enhancing the performance of the small model through an efficient pre-training task. Experiments show that PET succeeds in obtaining memory and time efficiencies by 81.20% and 45.20%, respectively, minimizing accuracy drop below 1%p. It outperforms
the competitors for various datasets in machine translation tasks.
어떻게 하면 큰 모델의 성능을 유지하면서 작은 크기의 효율적인 연산량을 가진 트랜 스포머모델을구할수있을까? 트랜스포머모델은 지난 몇 년 간 자연어 처리,컴퓨터 비전 등 다양한 분야에 걸쳐 뛰어난 성과를 보여주고 있다. 최근 연구의 주요 추세는 모델의 크기를 늘려 모델의 성능을 높이는 것이나, 모델의 크기를 무한정 늘리는 것은 현실적인 측면에서 바람직하지 않다. 실제 서비스에 기술이 적용되기 위해선 높은 성능 뿐만 아니라 메모리 효율, 빠른 추론 속도, 에너지 소모량 등에 대한 고려가 필요하며, 대부분 모델의 크기가 큰 경우 이를 만족하기 어렵다. 따라서 큰 모델의 성능을 유지하면서 작고 빠른 모델을 얻기 위한 효과적인 트랜스포머 모델 압축 기술이 필요하다.
기존의 트랜스포머 압축 연구들은 트랜스포머 인코더 기반 모델에 대한 것이 대부분으로, BERT 압축이 대표적이다. 기존의 인코더 압축 기법을 기계 번역 모델 등 인코더와 디코더가 혼재하는 모델에 적용할 경우 정확도가 크게 손실되었다. 디코더는 동일 임베딩 사이즈와 레이어 수의 인코더 보다 크기가 크며, 긴 추론 시간의 주요 원인이므로, 실용적인 트랜스포머 모델을 만들기 위해선 디코더 압축이 필수적이다. 이 논문에서는 트랜스포머의 인코더와 디코더를 모두 압축하기 위한 PET (Parameter- Efficient Knowledge Distillation on Transformer)를 제안한다. 제안 기법은 효과적인 모델 구조 설계와 초기화 기법의 개선을 통해 트랜스포머 지식 증류 기법의성능을 높였다. 또한, 추가적인 최적화 기법을 제안하여 압축 모델의 정확도를 더욱 높이는 데에 성공하였다.실험을 통해 제안기법이 다양한 기계 번역 데이터 셋에서 경쟁 모델보다 우수한 성능을 보이는 것을 확인하였고, 독일어 영어 번역 데이터 셋에서는 원본 모델보다 18.30% (임베딩 레이어 제외 시 9.51%)의 파라미터 수로 45.2% 빠르면서 정확도 감소를1% p이내로 줄이는 데 성공하였다.

Language: eng

URI: https://hdl.handle.net/10371/193347

https://dcollection.snu.ac.kr/common/orgView/000000176565

Files in This Item:

000000176565.pdf 2.42 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share