Reducing the Cost of Training a  Transformer Model by Using a Trained Model

Han, Minhee

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Reducing the Cost of Training a Transformer Model by Using a Trained Model : 이미 학습된 모델의 활용을 통한 새로운 트랜스포머 모델의 학습 비용 감소

DC Field	Value	Language
dc.contributor.advisor	이재진	-
dc.contributor.author	Han, Minhee	-
dc.date.accessioned	2022-12-29T07:45:22Z	-
dc.date.available	2022-12-29T07:45:22Z	-
dc.date.issued	2022	-
dc.identifier.other	000000172391	-
dc.identifier.uri	https://hdl.handle.net/10371/187785	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000172391	ko_KR
dc.description	학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022. 8. 이재진.	-
dc.description.abstract	The cost of training a new language model is higher than ever, and it continues to increase. To mitigate the issue, this paper proposes reusing a trained model to reduce the cost of training a larger model. By using the methods used in Knowledge Distillation(KD), the knowledge of the present trained model can be transferred to the new model, even when the new model is larger than the trained model. This is done by 1) copying the weights and 2) logits matching. The former can be used for models of the same dimensions while the second can be used regardless of the dimensions, though it requires more computations than the former. In the experiments with the GPT-like models, it is shown that reusing a relatively small trained model reduced the training time of a relatively larger model.	-
dc.description.abstract	새로운 자연어 처리 모델을 학습하는 비용은 어느 때보다도 높으며, 계속해서 증가하고 있다 이런 문제를 해결하기 위해 이 논문은 이미 학습된 모델을 재활용하여 더 큰 모델을 학습하는 비용을 줄이는 방안을 제시한다. 지식 증류(Knowledge Distillation)의 기법들을 이용해 이미 학습된 모델의 지식을 새로운 모델로 이전하는 것이 가능한데 이것은 심지어 새로운 모델이 학습된 모델보다 더 큰 경우에도 그러하다. 이는 1)그 가중치(weight)를 복사하는 것과 2)두 모델의 로짓(logits)을 같게 만드는 두 가지 방법으로 가능하다. 전자는 두 모델의 차원(dimension)이 동일한 경우에만 사용 가능하지만 후자는 그렇지 않은 경우에도 사용할 수 있다. GPT2와 비슷한 모델을 이용한 실험에서 두 가지 방법은 학습 시간을 각각 3.5%, 18.9% 단축하였다 이를 통해 비교적 작은 학습된 모델을 재사용해 큰 모델의 학습 시간을 단축할 수 있음을 보였다.	-
dc.description.tableofcontents	Chapter 1. Introduction p. 1 Chapter 2. Design and Implementation p. 3 Chapter 3. Experiments p. 7 Chapter 4. Conclusion p. 13	-
dc.format.extent	ii, 16	-
dc.language.iso	kor	-
dc.publisher	서울대학교 대학원	-
dc.subject	딥러닝	-
dc.subject	자연어처리	-
dc.subject	지식증류	-
dc.subject	트랜스포머모델	-
dc.subject	모델학습	-
dc.subject.ddc	621.39	-
dc.title	Reducing the Cost of Training a Transformer Model by Using a Trained Model	-
dc.title.alternative	이미 학습된 모델의 활용을 통한 새로운 트랜스포머 모델의 학습 비용 감소	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	한민희	-
dc.contributor.department	공과대학 컴퓨터공학부	-
dc.description.degree	석사	-
dc.date.awarded	2022-08	-
dc.identifier.uci	I804:11032-000000172391	-
dc.identifier.holdings	000000000048▲000000000055▲000000172391▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Files in This Item:

000000172391.pdf 0.71 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share