Metaphor Identification with Paragraph and Word Vectorization: An Attention-Based Neural Approach

티무르

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Metaphor Identification with Paragraph and Word Vectorization: An Attention-Based Neural Approach : 문단 밎 단어 벡터화를 활용한 지도 방식의 비유 표현 식별: 주목 모형 기반의 신경망적 접근

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 티무르

Advisor: 신효필

Major: 인문대학 언어학과

Issue Date: 2018-08

Publisher: 서울대학교 대학원

Description: 학위논문 (석사)-- 서울대학교 대학원 : 인문대학 언어학과, 2018. 8. 신효필.

Abstract: The current study investigates approaches to automatic metaphor identification, the computational task of identifying whether a word or phrase in a portion of text is an instance of metaphor. Using the VU Amsterdam Metaphor Corpus, a subset of the British National Corpus, Baby edition, with each word annotated for metaphor from a variety of registers (News, Academic, Fiction and Conversation), a binary supervised classification task was performed on the metaphorical status of each sentence in the corpus, predicting whether the sentence contains an instance of metaphor. Feature extraction was performed using dense distributional vector spaces, both at the word-level and sentence-level. The former was carried out by utilizing the Skip-Gram and Continuous Bag-of-Words algorithms, obtaining a dense vectorized representation of each word, while the latter used the Paragraph Vector, an extension to these two algorithms for blocks of text larger than the word level, resulting in a vector containing distributional information of the general context for each sentence.

With features extracted using these models, the performance of several different neural network systems are compared against a baseline of logistic regression, testing various hyperparameters with stratified 10-fold cross-validation. Specifically, sentence-level input features obtained from the paragraph vector are tested using Logistic Regression, the Support Vector Machine, as well as a Feedforward Neural Network, while word-level input features are tested using a Bidirectional LSTM with Attention mechanism, allowing for a direct observation of which words are the most salient in contributing to the identification of a particular sentence as an instance of metaphor. The obtained results show a significant improvement on previous research and high success rates across the different models. Compared to the baseline of logistic regression, the SVM and feedforward neural network improved results, with the feedforward neural network having the highest F-score for paragraph vector input features. The bLSTM with attention mechanism and word-level input features improved upon this further, having the highest results overall in the study. This can be seen as strong evidence for the necessity of using state-of-the-art neural network architectures in supervised metaphor identification, being able to pick up on the various latent patterns provided by the vector space model.

Language: English

URI: https://hdl.handle.net/10371/143981

Files in This Item:

000000152563.pdf 3.22 MB

Appears in Collections:

College of Humanities (인문대학)
- Linguistics (언어학과)
  - Theses (Master's Degree_언어학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share