Sentence Matching using Deep Learning for Question Answering

Abstract: 질의 응답 시스템은 딥 뉴럴 네트워크의 발전에 힘입어 자연어 처리 분야에 있어서 중요한 어플리케이션 중 하나가 되고 있다.
본 논문에서는 다양한 질의 응답 모델에서 활용되고 있는 텍스트 매칭 연구를 진행하였다.

먼저 질의 응답 시스템을 구성하는 질문 의미 판단 (Question paraphrase identification), 자연 언어 추론 (Natural language inference), 그리고 정답 문장 선택 (Answer sentence selection) 등에 활용이 될 수 있는 문장 쌍의 의미적 매칭에 대한 연구를 진행하였다.
우리는 상호 연결된 깊은 회기 신경망 구조를 제안하였는데, 이 네트워크는 가장 낮은 레이어인 워드 임베딩부터 가장 높은 레이어까지, 모든 출력 표상 (representation) 들이 변형 없이 이용될 수 있도록 하였다. 다만, 이러한 구조의 문제점으로는 레이어가 깊어질 수록 벡터의 차원이 커진다는 문제가 있는데, 이를 Autoencoder 를 활용하여 큰 차원의 벡터를 압축 함으로써 이러한 문제를 완화하였다.

두 번째로는, 텍스트 컨텍스트로부터 질의 응답을 하기 위해 집중해서 봐야할 부분을 잘 매칭하기 위한 기법들을 제안한다.
먼저, 컨텍스트에 전문 용어들이 많이 등장하는 경우, 키워드를 잘 파악하는것이 중요한데 우리는 이를 위해 컨텍스트 문서의 각 문장에 대해 Dependency Parser 로 컨텍스트 그래프를 구축하였다.
그리고 구축된 그래프에서 질문과 답변에 등장하는 용어가 있는 노드를 앵커 노드로 지정을 하였다. 이 앵커 노드로부터 멀리 있는 노드들을 삭제함으로써 더 정확한 답을 할 수 있도록, 확인해야하는 컨텍스트의 범위를 좁혔다.
또한, 긴 자막을 가지고 있는 질의 응답 태스크에서, 답변을 하기 위해 필요한 자막 문장을 매칭 점수로 분류하는 분류기를 부가적으로 학습에 활용함으로써 질의 응답의 성능을 높일 수 있도록 하였다.

마지막으로는 객관식 유형의 질의 응답 시스템의 성능을 높이기 위한 학습 방식을 제안하였다. 제안한 학습 방식으로는, 실제 해결하고자 하는 태스크를 진행하기 전 자가지도 사전 학습을 통해 좀 더 좋은 초기 파라미터를 가질 수 있도록 하였고, 실제 학습 단계에서는 대조적 손실 함수를 활용하였다.
먼저, 사전 학습을 위해, 답을 맞춰야 하는 문제의 유형을 변형하여 주어진 컨텍스트에 더 어울리는 질문을 맞추도록 만들어 학습하였고, 본 태스크를 진행하기 전에 좀 더 좋은 모델 파라미터를 가질 수 있도록 하였다.
본 단계에서는 정답과 오답간의 임베딩 영역이 잘 분리 될 수 있도록 대조적 학습 손실 함수를 추가하였고, 이는 최종적인 질의 응답 성능에 도움을 주었다.
이를 통해 TVQA, TVQA+, 또는 DramaQA 와 같은 객관식 유형의 비디오 기반 질의 응답 태스크에서 기존 제안된 다른 모델들보다 더 좋은 성능을 달성할 수 있었다.
Question Answering is becoming one of the most important applications in natural language processing, thanks to the development of deep neural networks.
Improving the performance of answering the questions helps humans acquire more useful information efficiently.
In this dissertation, we study sentence matching that understands the relationship between the sentences for better reasoning in various question answering systems.

First, we propose a semantic sentence matching model for question paraphrase identification, natural language inference, and answer sentence selection which can be used in the question answering system.
We propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of the ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation.

Second, we propose matching strategies to find the relevant part against the question and the answer option from the textual context.
For the word-level matching required in the task which has a number of technical terminologies,
We build a dependency tree with Dependency Parser for each sentence of the textual context and designate the words which exist in the question and the answer option as anchor nodes.
We can narrow down the scope to answer more precisely by removing the nodes which are far from the anchor nodes.
In addition, we utilize an additional temporal localization classifier as an auxiliary task to find the relevant subtitle sentence from the long subtitle context by calculating the relevance matching score of the subtitle sentences.

Lastly, we propose the training schemes for multiple-choice video question answering in order to enhance the performance with a self-supervised pre-training stage and supervised contrastive learning in the main stage as auxiliary learning.
For the pre-training stage, we transform the original problem format to have a better parameter initialization from predicting the correct answer into predicting the corresponding question of the context by building the synthesized pre-training dataset.
In the main stage, we propose the supervised contrastive representation learning method as another auxiliary learning to separated the embedding space between the correct answer and the wrong answers to enhance the model performance.
Taking the ground truth answer as a positive sample and the rest as negative samples, the contrastive loss confines the positive sample to be mapped in the neighborhood of an anchor, a perturbed ground truth answer, and the negative samples to be away from the anchor. Our model achieves the best performance on the challenging multiple-choice Video QA tasks, TVQA, TVQA+, and DramaQA.

Language: eng

URI: https://hdl.handle.net/10371/178954

https://dcollection.snu.ac.kr/common/orgView/000000167185

Files in This Item:

000000167185.pdf 14.62 MB

Appears in Collections:

College of Medicine/School of Medicine (의과대학/대학원)
- Dept. of Medicine (의학과)
  - Theses (Ph.D. / Sc.D._의학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share