Publications

Detailed Information

Automatic Story Extraction for Photo Stream via Coherence Recurrent Convolutional Neural Network

DC Field Value Language
dc.contributor.advisor김건희-
dc.contributor.author박천성-
dc.date.accessioned2017-07-14T02:36:15Z-
dc.date.available2017-07-14T02:36:15Z-
dc.date.issued2017-02-
dc.identifier.other000000140900-
dc.identifier.urihttps://hdl.handle.net/10371/122686-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 컴퓨터공학부, 2017. 2. 김건희.-
dc.description.abstractDue to advances in computing power, data gathering and researchers there have been many improvements in artificial intelligence. Particularly, research related to images has proceeded very quickly. Computers have a similar level of cognitive abilities and can do many things that people can do through vision. It became possible to see, understand and express. Among them, We will focus on visual understanding and natural language expression. Various studies have been conducted to understand visual information and express it in natural language. One challenge that comes to the performance that a person can make is the creation of image captions for Flickr30K and MS COCO dataset. However, there is still a limit to simple data and tasks.
In this dissertation, we propose an approach for retrieving a sequence of natural sentences for an image stream. We dill with more complex, non-refined data compared to the previous work. This dissertation extends the preliminary work of Park and Kim, and an amendment of it was submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.
Since general users often take a series of pictures on their experiences, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sen- tences. To this end, we propose a multimodal neural architecture called coher- ence recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional long short-term memory (LSTM) networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We collect more than 22K unique blog posts with 170K associated images for the topics of NYC, Disneyland, Australia, and Hawaii. We demonstrate that our approach outperforms other state-of-the-art image captioning candidate meth- ods, using both quantitative measures and user studies via Amazon Mechanical Turk.
-
dc.description.tableofcontentsChapter 1 Introduction 1
Chapter 2 Related work 5
Chapter 3 Problem Statement 9
3.1 Blog Datasets 11
3.2 Blog Pre-processing 12
3.3 Text Description 13
Chapter 4 Our Architecture 14
4.1 The BLSTM Model 15
4.2 The Local CoherenceModel 16
4.3 Combination of CNN, RNN, and Coherence Model 17
4.4 Training the CRCN 18
4.5 Prediction of Sentence Sequences 20
Chapter 5 Experiments 21
5.1 Experimental Setting 21
5.2 Quantitative Results 26
5.3 Qualitative Results 29
5.4 User Studies via Amazon Mechanical Turk 30
Chapter 6 Conclusion 36
Bibliography 37
요약 41
-
dc.formatapplication/pdf-
dc.format.extent11914542 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectDeep learning-
dc.subjectRecurrent Neural Network-
dc.subjectConvolutional Neural Network-
dc.subjectPhoto stream-
dc.subjectStory extraction-
dc.subjectCoherence-
dc.subjectImage captioning-
dc.subjectNatural Language Processing-
dc.subject.ddc621-
dc.titleAutomatic Story Extraction for Photo Stream via Coherence Recurrent Convolutional Neural Network-
dc.typeThesis-
dc.contributor.AlternativeAuthorCesc Chunseong Park-
dc.description.degreeMaster-
dc.citation.pages42-
dc.contributor.affiliation공과대학 컴퓨터공학부-
dc.date.awarded2017-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share