Improving Bi-encoder Neural Ranking Models using Knowledge Distillation and Lightweight Fine-tuning

최재걸

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Improving Bi-encoder Neural Ranking Models using Knowledge Distillation and Lightweight Fine-tuning : 지식 증류와 경량 파인튜닝을 이용한 바이 인코더 신경망 랭킹 모델의 개선

DC Field	Value	Language
dc.contributor.advisor	Wonjong Rhee	-
dc.contributor.author	최재걸	-
dc.date.accessioned	2022-12-29T08:41:11Z	-
dc.date.available	2022-12-29T08:41:11Z	-
dc.date.issued	2022	-
dc.identifier.other	000000171975	-
dc.identifier.uri	https://hdl.handle.net/10371/188298	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000171975	ko_KR
dc.description	학위논문(박사) -- 서울대학교대학원 : 융합과학기술대학원 지능정보융합학과, 2022. 8. Wonjong Rhee.	-
dc.description.abstract	In recent studies, pre-trained language models, especially bidirectional encoder representations from transformers (BERT) have been essential in enhancing the performance of neural ranking models (NRMs). Various BERT-based NRMs have been proposed, and many have achieved state-of-the-art performance. BERT-based NRMs can be classified according to how the query and document are encoded through BERTs self-attention layers: bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Because of their efficiency, bi-encoder models are much more deployable in real search engines and tend to receive more attention from industrial practitioners. However, their performance does not reach that of cross-encoder models. Therefore, improving the performance of bi-encoder models is a promising research direction. This thesis explores the methods to improve bi-encoder NRMs using knowledge distillation and lightweight fine-tuning. We consider a method that transfers the knowledge of a teacher cross-encoder model to a student bi-encoder model using knowledge distillation. Knowledge distillation enables a bi-encoder student to imitate the representation of a cross-encoder teacher and have the advantages of both types of models. The resulting student bi-encoder achieves an improved performance by simultaneously learning from a cross-encoder teacher and a bi-encoder teacher. We also investigate lightweight fine-tuning to improve bi-encoder NRMs. Lightweight fine-tuning is a method of fine-tuning only a small portion of the model weights, and is known to have a regularization effect. We demonstrate two approaches for improving the performance of BERT-based bi-encoders using lightweight fine-tuning. The first approach is to replace the full fine-tuning step with lightweight fine-tuning. The second is to develop semi-Siamese models in which queries and documents are handled with a limited amount of difference. The limited difference is realized by learning two lightweight fine-tuning modules, where the main language model of BERT is kept common for both query and document. We provide extensive experimental results, which confirm that both lightweight fine-tuning and semi-Siamese models are considerably helpful for improving BERT-based bi-encoders. Finally, we present a model that uses these two methods simultaneously. Using knowledge distillation and lightweight fine-tuning methods together, a model can gain the effects of both methods, resulting in further performance improvement over the individual methods. We anticipate that these techniques will be broadly applicable to industrial domains.	-
dc.description.abstract	최근 연구에서 다양한 BERT기반의 신경망 랭킹 모델이 제안되었고, 이 모델들은 최고의 성능을 보여주고 있다. BRET기반 랭킹 모델은 쿼리와 문서간의 관계가 BERT의 셀프 어텐션을 통해서 계산되는가의 여부에 따라 크로스 인코더와 바이 인코더로 구분된다. 크로스 인코더 모델은 높은 성능을 가지고 있지만 효율이 좋지 못한 단점이 있다. 반면, 바이 인코더 모델은 크로스 인코더에 비해 성능은 떨어지지만, 모든 문서의 벡터 표현형을 미리 구해놓을 수 있기 때문에 높은 효율성을 가지고 있다. 바이 인코더 모델은 효율적이기 때문에 실제 검색 엔진에 배포가 가능하다. 이런 이유로 바이 인코더 모델은 검색 업계로부터 더 많은 관심을 받는다. 그러나 앞에서 언급했듯이, 바이 인코더 모델의 성능이 크로스 인코더 모델에 도달하지 못한다는 문제가 있다. 따라서 바이 인코더 모델의 성능을 향상시키는 것은 랭킹 모델을 실제로 이용하려고 하는 영역에서는 매력적인 문제이다. 이 연구에서는 지식 증류와 경량 파인튜닝을 이용하여 바이 인코더 모델을 개선하는 방법을 탐구한다. 우리는 지식 증류를 사용하여 크로스 인코더 모델의 지식을 바이 인코더 모델로 전달하는 방법을 연구한다. 지식 증류를 통해 만들어진 바이 인코더 모델은 크로스 인코더로부터 배운 지식을 이용하기 때문에 성능이 향상된다. 우리는 또한 바이 인코더 모델을 개선하기 위한 경량 파인튜닝 방법을 이용한다. 경량 파인튜닝은 모델 가중치의 일부만 미세하게 학습하는 방법으로, 모델의 정규화 효과가 있는 것으로 알려져 있다. 경량 파인튜닝을 사용하여 BERT기반 바이 인코더 모델의 성능을 개선하기 위한, 두 가지 접근 방식을 이용한다. 첫 번째 접근 방식은 파인튜닝을 경량 파인튜닝으로 대체하는 것이다. 두 번째 접근 방식은 쿼리와 문서를 서로 다르게 처리하는 세미 샴 모델을 이용하는 것이다. 우리는 다양한 실험을 통하여 경량 파인튜닝 방법과 세미 샴 모델이 바이 인코더 모델을 개선하는 데 상당히 도움이 됨을 확인하였다. 마지막으로 지식증류와 경량 파인튜닝 방법을 동시에 사용하는 모델을 제시한다. 두 방법을 모두 사용한 모델이 두 방법을 사용한 각각의 방법보다 성능이 더 좋음을 실험으로 확인하였다. 우리가 제안한 방법이 검색 업계에 도움이 될 것으로 기대한다.	-
dc.description.tableofcontents	Chapter 1. Introduction 1 1.1 Thesis Outline 3 1.2 Related Publications 4 Chapter 2. Background 5 2.1 Information Retrieval 5 2.1.1 Text Ranking using Neural Ranking Models 5 2.2 Ad-hoc Retrieval Problems 8 2.2.1 The Concept of Relevance 9 2.2.2 Test Collections 10 2.2.3 Ranking Metrics 10 2.3 A Brief history of Ad-hoc Retrieval 14 2.3.1 The Era of Exact Match 14 2.3.2 Pre-BERT Neural Ranking Model 17 2.3.3 BERT-based Neural Ranking Models 19 2.4 Research Motivation 21 2.5 Thesis Roadmap 24 Chapter 3. Bi-encoder Neural Ranking Models with Distillation - TRMD 25 3.1 Introduction 25 3.2 Related Works 27 3.2.1 NRMs before Pre-trained Language Models 27 3.2.2 NRMs with BERT 28 3.2.3 Efficient NRMs 29 3.3 Methodology 29 3.3.1 Architecture 30 3.3.2 Learning through Multi-teacher Distillation 31 3.4 Experimental Result 32 3.4.1 Experiment 33 3.4.2 Result and Analysis 35 3.5 Discussion 36 3.6 Conclusion 37 Chapter 4. Bi-encoder Neural Ranking Models with Light weight fine-tuning SS LFT 38 4.1 Introduction 38 4.2 Related Works 41 4.2.1 BERT-based NRMs 41 4.2.2 Lightweight Fine-Tuning (LFT) 41 4.2.3 Semi-Siamese (SS) Models 43 4.3 Methodology 44 4.3.1 Document Re-ranking 44 4.3.2 Lightweight Fine-Tuning (LFT) 46 4.3.3 Semi-Siamese Neural Ranking Model 49 4.4 Experiment 51 4.4.1 Experimental Setup 51 4.4.2 LFT Results for Cross-encoder 53 4.4.3 LFT Results for Bi-encoders 55 4.4.4 Semi-Siamese LFT Results for Bi-encoders 59 4.5 Discussion 60 4.5.1 Cross-encoder vs. Bi-encoder 60 4.5.2 Hybrid: Concurrent Learning vs. Sequential Learning 62 4.6 Conclusion 63 Chapter 5. Bi-encoder Neural Ranking Models with Knowledge Distillation and Lightweight Fine-tuning 64 5.1 Introduction 64 5.2 Related Works 66 5.2.1 Dense Retriever 66 5.2.2 Improving Dense Retriever 67 5.2.3 Lightweight Fine-tuning and semi-Siamese network 68 5.3 Methodology 70 5.3.1 Document Ranking 70 5.3.2 Knowledge Supervision 70 5.3.3 Lightweight fine-tuning 72 5.3.4 Semi-Siamese Lightweight fine-tuning 73 5.3.5 SS LFT with supervision 74 5.3.6 Training Procedure 76 5.4 Experiment 76 5.4.1 Experimental Setup 76 5.4.2 Results of combination method 78 5.5 Discussion 81 5.5.1 Difference between cross-encoder and bi-encoder models 82 5.5.2 How to overcome the shortage of bi-encoder models 83 5.6 Conclusion 85 Chapter 6. Conclusion 86 6.1 Effectiveness and Efficiency 86 6.2 Expansion to Text Ranking 88 6.3 Future Work 88 Bibliography 89 Appendices 100 A Knowledge Distillation Methods 100 B Variants of SS Prefix-tuning 101 C Variants of SS LoRA 103 D Efficiency of Training and Inference 104 D.1 Training Time 104 D.2 Inference Time 104 E Hyper-parameter setting 105 Acknowledgement 107	-
dc.format.extent	xi, 107	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	InformationRetrieval	-
dc.subject	NeuralRankingModel	-
dc.subject	Bi-encoderRankingModel	-
dc.subject	KnowledgeDistillation	-
dc.subject	LightweightFine-tuning	-
dc.subject.ddc	006.3	-
dc.title	Improving Bi-encoder Neural Ranking Models using Knowledge Distillation and Lightweight Fine-tuning	-
dc.title.alternative	지식 증류와 경량 파인튜닝을 이용한 바이 인코더 신경망 랭킹 모델의 개선	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Jaekeol Choi	-
dc.contributor.department	융합과학기술대학원 지능정보융합학과	-
dc.description.degree	박사	-
dc.date.awarded	2022-08	-
dc.contributor.major	지능정보융합전공	-
dc.identifier.uci	I804:11032-000000171975	-
dc.identifier.holdings	000000000048▲000000000055▲000000171975▲	-

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Intelligence and Information (지능정보융합학과)
  - Theses (Ph.D. / Sc.D._지능정보융합학과)

Files in This Item:

000000171975.pdf 2.22 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share