Reducing the Cost of Training a  Transformer Model by Using a Trained Model

Han, Minhee

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Reducing the Cost of Training a Transformer Model by Using a Trained Model : 이미 학습된 모델의 활용을 통한 새로운 트랜스포머 모델의 학습 비용 감소

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: Han, Minhee

Advisor: 이재진

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: 딥러닝 ; 자연어처리 ; 지식증류 ; 트랜스포머모델 ; 모델학습

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022. 8. 이재진.

Abstract: The cost of training a new language model is higher than ever, and it continues to increase. To mitigate the issue, this paper proposes reusing a trained model to reduce the cost of training a larger model. By using the methods used in Knowledge Distillation(KD), the knowledge of the present trained model can be transferred to the new model, even when the new model is larger than the trained model. This is done by 1) copying the weights and 2) logits matching. The former can be used for models of the same dimensions while the second can be used regardless of the dimensions, though it requires more computations than the former. In the experiments with the GPT-like models, it is shown that reusing a relatively small trained model reduced the training time of a relatively larger model.
새로운 자연어 처리 모델을 학습하는 비용은 어느 때보다도 높으며, 계속해서 증가하고 있다 이런 문제를 해결하기 위해 이 논문은 이미 학습된 모델을 재활용하여 더 큰 모델을 학습하는 비용을 줄이는 방안을 제시한다. 지식 증류(Knowledge Distillation)의 기법들을 이용해 이미 학습된 모델의 지식을 새로운 모델로 이전하는 것이 가능한데 이것은 심지어 새로운 모델이 학습된 모델보다 더 큰 경우에도 그러하다. 이는 1)그 가중치(weight)를 복사하는 것과 2)두 모델의 로짓(logits)을 같게 만드는 두 가지 방법으로 가능하다. 전자는 두 모델의 차원(dimension)이 동일한 경우에만 사용 가능하지만 후자는 그렇지 않은 경우에도 사용할 수 있다. GPT2와 비슷한 모델을 이용한 실험에서 두 가지 방법은 학습 시간을 각각 3.5%, 18.9% 단축하였다 이를 통해 비교적 작은 학습된 모델을 재사용해 큰 모델의 학습 시간을 단축할 수 있음을 보였다.

Language: kor

URI: https://hdl.handle.net/10371/187785

https://dcollection.snu.ac.kr/common/orgView/000000172391

Files in This Item:

000000172391.pdf 0.71 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share