생성형 거대 언어 모델을 이용한 문서 수준 관계 추출 과제 해결

Abstract: 생성형 거대 언어 모델에 관한 여러 연구들은 생성형 거대 언어 모델이 자연어에 대한 이해를 넘어 상당한 '지식'을 가지고 있음을 시사한다. 이에 생성형 거대 언어 모델의 지식을 활용해 여러 추론 문제를 풀려고 하는 시도들이 계속되고 있다. 한편 문서 수준 관계 추출은 두 문장 이상으로 구성된 텍스트에서 두 개체 간의 관계를 찾아내는 과제로, 대규모 지식 그래프 구축에 있어 필수적인 과제이다.
본 학위논문에서는 생성형 거대 언어 모델을 이용해 사전학습-프롬프트 -예측 패러다임으로 문서 수준 관계 추출 과제를 해결하는 방법을 연구하였다. 기반 모델로는 Meta에서 공개한 LLaMA-7B 모델을 사용하였다. 학습 데이터 셋으로는 문서 수준 관계 추출 학습/벤치마크 데이터 셋인 Re-DocRED를 사용하였다. 기반 모델은 Chain of Thought를 활용해 만들어진 입력 프롬프트-출력 기댓값 쌍들에 대해 지시사항 미세조정된다. 이때 LoRA를 활용해 효율적인 방식으로 학습을 진행했다.
제안한 방법의 성능을 확인하기 위해 다양한 실험을 수행하였다. 우선 Re-DocRED에 대한 F1 점수는 0.47로, '사전학습-미세조정' 패러다임에 기반한 기존 방법들에 비해 성능이 낮게 측정되었다. 하지만 모델이 실제로 생성한 다양한 정답/오답 사례들에서 데이터 셋의 한계와 평가 방법의 한계 등을 확인할 수 있었다. 또한 지시사항 미세조정과 Chain of Thought를 각각 사용했을 때와 사용하지 않았을 때의 모델 성능을 비교해, 두 방법이 모델 성능에 유의미한 기여를 하고 있음을 확인하였다. 특히 재현율만 높이는 방향으로 모델을 미세조정할 경우, 재현율 0.73을 달성하는 것을 확인할 수 있었다.
Multiple studies on large generative language models indicate that these models possess significant knowledge beyond their understanding of natural language. As a result, there have been ongoing efforts to utilize the knowledge of large generative language models to solve various reasoning problems. On the other hand, document-level relation extraction is a task to identify relationships between entities in texts consisting of more than one sentence and is essential for constructing large-scale knowledge graphs.
In this thesis, we research a method to address document-level relation extraction task using a large generative language model in the pretraining-prompt-prediction paradigm. We utilize the LLaMA-7B model released by Meta as our base model. For training, we use the Re-DocRED dataset, which is a training/benchmark dataset for document-level relation extraction. The base model is instruction fine-tuned with input prompt-expected output pairs generated with Chain of Thought. LoRA is used during the training process to facilitate efficient learning.
To validate the performance of the proposed method, we perform various experiments. Firstly, the F1 score on Re-DocRED is measured to be 0.47, which is lower compared to existing approaches based on the pretraining-finetuning paradigm. However, we are able to identify the limitations of the dataset as well as evaluation methods through various correct and incorrect answers generated by the model. Furthermore, by comparing the model's performance with and without the use of fine-tuning instructions and Chain of hought, we confirm that both techniques contribute significantly to the model's performance. In particular, we figure out that we can achieve a recall of 0.73 when fine-tuning the model with a focus on improving recall.

Language: kor

URI: https://hdl.handle.net/10371/196570

https://dcollection.snu.ac.kr/common/orgView/000000178819

Files in This Item:

000000178819.pdf 1.95 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Program in Artificial Intelligence (협동과정-인공지능전공)
  - Theses (Master's Degree_협동과정-인공지능전공)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share