S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Computer Science and Engineering (컴퓨터공학부) Theses (Ph.D. / Sc.D._컴퓨터공학부)
Deep Memory Networks for Natural Conversations
- 공과대학 전기·컴퓨터공학부
- Issue Date
- 서울대학교 대학원
- Attention Model; Memory Network; Deep Learning; Natural Language Understanding; Machine Comprehension
- 학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 장병탁.
- Attention-based models are firstly proposed in the field of computer vision. And then they spread into natural language processing (NLP). The first one successfully bringing in attention mechanism from computer vision to NLP is neural machine translation. Such attention-based mechanism is motivated from that, instead of decoding based on the encoding of a whole and a fixed-length sentence during one pass of neural network-based machine translation, one can attend a specific part of the sentence. This specific part is what should currently be attended. These parts could be words or phrases.
The basic problem that the attention mechanism solves is that it allows the network to refer back to the input sequence, instead of forcing it to encode all information into one fixed-length vector. The attention mechanism is simply giving the network access to its internal memory, which is the hidden state of the encoder. In this point of view, instead of choosing what to attend to, the network chooses what to retrieve from memory. Unlike typical memory, the memory access mechanism here is soft, which means that the network retrieves a weighted combination of all memory locations, not a value from a single discrete location. Making the memory access soft has the benefit that we can easily train the network end-to-end using backpropagation
The trend towards more complex memory structures is now continuing. End-to-End Memory Networks allow the network to read same input sequence multiple times before making an output, updating the memory contents at each step. For example, answering a question by making multiple reasoning steps over an input story. However, when the networks parameter weights are tied in a certain way, the memory mechanism in End-to-End Memory Networks identical to the attention mechanism presented here, only that it makes multiple hops over the memory.
In this dissertation, we propose the deep memory network with attention mechanism and word/sentence embedding for attention mechanism. Due to the external memory and attention mechanism, proposed method can handle various tasks in natural language processing, such as question and answering, machine comprehension and sentiment analysis. Usually attention mechanism requires huge computational cost. In order to solve this problem. I also propose novel word and sentence embedding methods. Previous embedding methods only use the Markov assumption. But if we consider the language structure and make use of it, it will be very helpful to reduce the computational cost. Also it does not need strong supervision which means the additional information on important sentences.