Metaphor Identification with Paragraph and Word Vectorization: An Attention-Based Neural Approach
문단 밎 단어 벡터화를 활용한 지도 방식의 비유 표현 식별: 주목 모형 기반의 신경망적 접근

Cited 0 time in webofscience Cited 0 time in scopus
인문대학 언어학과
Issue Date
서울대학교 대학원
학위논문 (석사)-- 서울대학교 대학원 : 인문대학 언어학과, 2018. 8. 신효필.
The current study investigates approaches to automatic metaphor identification, the computational task of identifying whether a word or phrase in a portion of text is an instance of metaphor. Using the VU Amsterdam Metaphor Corpus, a subset of the British National Corpus, Baby edition, with each word annotated for metaphor from a variety of registers (News, Academic, Fiction and Conversation), a binary supervised classification task was performed on the metaphorical status of each sentence in the corpus, predicting whether the sentence contains an instance of metaphor. Feature extraction was performed using dense distributional vector spaces, both at the word-level and sentence-level. The former was carried out by utilizing the Skip-Gram and Continuous Bag-of-Words algorithms, obtaining a dense vectorized representation of each word, while the latter used the Paragraph Vector, an extension to these two algorithms for blocks of text larger than the word level, resulting in a vector containing distributional information of the general context for each sentence.

With features extracted using these models, the performance of several different neural network systems are compared against a baseline of logistic regression, testing various hyperparameters with stratified 10-fold cross-validation. Specifically, sentence-level input features obtained from the paragraph vector are tested using Logistic Regression, the Support Vector Machine, as well as a Feedforward Neural Network, while word-level input features are tested using a Bidirectional LSTM with Attention mechanism, allowing for a direct observation of which words are the most salient in contributing to the identification of a particular sentence as an instance of metaphor. The obtained results show a significant improvement on previous research and high success rates across the different models. Compared to the baseline of logistic regression, the SVM and feedforward neural network improved results, with the feedforward neural network having the highest F-score for paragraph vector input features. The bLSTM with attention mechanism and word-level input features improved upon this further, having the highest results overall in the study. This can be seen as strong evidence for the necessity of using state-of-the-art neural network architectures in supervised metaphor identification, being able to pick up on the various latent patterns provided by the vector space model.
Files in This Item:
Appears in Collections:
College of Humanities (인문대학)Linguistics (언어학과)Theses (Master's Degree_언어학과)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.