A Study on Aspect-oriented  Summarization using Transformer

Abstract: Text summarization is well-known as a representative task in natural language processing. Text summarization methods generate brief written summaries of documents such as journal articles. In recent years, the performance of text summarization methods has improved significantly with the development of pretrained language models based on Transformer architectures such as BERT and GPT 3.
Recently, the development of language models designed to generate controllable output based on user preferences has attracted considerable attention as a topic of active research. Controllable summarization methods such as query-focused or aspect-oriented summarization techniques have also emerged as promising
approaches. In particular, aspect-oriented summarization generates a summary in terms of specific aspects provided as user input.
In this study, we propose a method to improve the performance of an aspect-oriented extractive summarization model presented in a previous work. The proposed method helps the model to generate
aspect-oriented summaries by reflecting the relevance between sentence features and keyword features representing the aspect. To evaluate the performance of the proposed method, we constructed a new dataset consisting of articles on COVID-19 labeled in terms of two aspects: Trend and Action. The results showed that our proposed method outperformed a baseline model on the new dataset.
The proposed method exhibited higher performance than the baseline by roughly 3.6–4.3% in terms of Trend, and showed a relatively low impact with an improvement of less than 1% in terms of Action. However, in both aspects, we observed that even incorrect sentences included in a generated summary tended to be related to the defined aspect. Thus, we demonstrate that the proposed method generated more aspect-oriented summaries with content relevant to the defined aspect.
텍스트 요약(Text Summarization)은 자연어 처리 분야의 대표적인 작업 중 하나이다. 텍스트 요약의 목적은 신문 기사와
같은 문서를 간결하지만, 핵심적인 내용을 중심으로 요약하는 것이다. BERT, GPT-3와 같은 트랜스포머 기반의 사전학습 모델들이 개발됨에 따라, 요약 모델의 성능이 크게 향상되었다.
최근에는 사용자의 목적 혹은 선호도를 반영하여 출력을 생성하는 언어 모델을 개발하기 위해 많은 연구들이 진행되고 있다. 텍스트 요약 분야에서도 이러한 흐름에 따라 쿼리 중심(Query focused) 혹은 측면 중심(Aspect oriented) 요약과 같이 제어 가능한 요약문 생성에 대한 연구들이 등장하고 있다. 측면 중심 요약(Aspect oriented)은 사용자가 알고 싶은 특정 측면에 대해서 요약문을 생성하는 것을 목표로 한다.
본 논문에서는 선행 연구에서 제안한 측면 중심 요약 모델의 성능 향상을 위한 방법을 제안한다. 제안된 방법은 문장의 표현 벡터와 측면을 대표하는 키워드 표현 벡터들 사이의 연관성을 기존의 문장 표현 벡터에 반영함으로써 모델이 측면과 관련된 요약문을 생성하도록 했다. 평가를 위해서, 발생 현황과 관련 대응이라는 두 가지 측면을 가지는 COVID-19 관련 기사로 구성된 새로운 데이터셋을 구축하였다. 제안된 방법들은 새로운 데이터셋에 대하여 기존 모델보다 더 좋은 성능을
보여주었다.
제안된 방법은 발생 현황 측면에서는 3.6~4.3%로 높은 성능 향상을 가져왔으며, 관련 대응 측면에서는 1%미만의 향상으로, 비교적 낮은 효과를 보여주었다. 하지만 두 측면 모두에서 오답이라 하더라도 측면과 관련된 문장을 선택하는 것을 관찰했다. 이를 통해, 제안된 방법이 모델의 측면 지향 요약에 도움을 주었음을 확인할 수 있었다.

Language: eng

URI: https://hdl.handle.net/10371/193412

https://dcollection.snu.ac.kr/common/orgView/000000174619

Files in This Item:

000000174619.pdf 2.35 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Program in Bioengineering (협동과정-바이오엔지니어링전공)
  - Theses (Master's Degree_협동과정-바이오엔지니어링전공)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share