Automatic Story Extraction for Photo Stream via Coherence Recurrent Convolutional Neural Network

박천성

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Automatic Story Extraction for Photo Stream via Coherence Recurrent Convolutional Neural Network

DC Field	Value	Language
dc.contributor.advisor	김건희	-
dc.contributor.author	박천성	-
dc.date.accessioned	2017-07-14T02:36:15Z	-
dc.date.available	2017-07-14T02:36:15Z	-
dc.date.issued	2017-02	-
dc.identifier.other	000000140900	-
dc.identifier.uri	https://hdl.handle.net/10371/122686	-
dc.description	학위논문 (석사)-- 서울대학교 대학원 : 컴퓨터공학부, 2017. 2. 김건희.	-
dc.description.abstract	Due to advances in computing power, data gathering and researchers there have been many improvements in artificial intelligence. Particularly, research related to images has proceeded very quickly. Computers have a similar level of cognitive abilities and can do many things that people can do through vision. It became possible to see, understand and express. Among them, We will focus on visual understanding and natural language expression. Various studies have been conducted to understand visual information and express it in natural language. One challenge that comes to the performance that a person can make is the creation of image captions for Flickr30K and MS COCO dataset. However, there is still a limit to simple data and tasks. In this dissertation, we propose an approach for retrieving a sequence of natural sentences for an image stream. We dill with more complex, non-refined data compared to the previous work. This dissertation extends the preliminary work of Park and Kim, and an amendment of it was submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence. Since general users often take a series of pictures on their experiences, much online visual information exists in the form of image streams, for which it would better take into consideration of the whole image stream to produce natural language descriptions. While almost all previous studies have dealt with the relation between a single image and a single natural sentence, our work extends both input and output dimension to a sequence of images and a sequence of sen- tences. To this end, we propose a multimodal neural architecture called coher- ence recurrent convolutional network (CRCN), which consists of convolutional neural networks, bidirectional long short-term memory (LSTM) networks, and an entity-based local coherence model. Our approach directly learns from vast user-generated resource of blog posts as text-image parallel training data. We collect more than 22K unique blog posts with 170K associated images for the topics of NYC, Disneyland, Australia, and Hawaii. We demonstrate that our approach outperforms other state-of-the-art image captioning candidate meth- ods, using both quantitative measures and user studies via Amazon Mechanical Turk.	-
dc.description.tableofcontents	Chapter 1 Introduction 1 Chapter 2 Related work 5 Chapter 3 Problem Statement 9 3.1 Blog Datasets 11 3.2 Blog Pre-processing 12 3.3 Text Description 13 Chapter 4 Our Architecture 14 4.1 The BLSTM Model 15 4.2 The Local CoherenceModel 16 4.3 Combination of CNN, RNN, and Coherence Model 17 4.4 Training the CRCN 18 4.5 Prediction of Sentence Sequences 20 Chapter 5 Experiments 21 5.1 Experimental Setting 21 5.2 Quantitative Results 26 5.3 Qualitative Results 29 5.4 User Studies via Amazon Mechanical Turk 30 Chapter 6 Conclusion 36 Bibliography 37 요약 41	-
dc.format	application/pdf	-
dc.format.extent	11914542 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	Deep learning	-
dc.subject	Recurrent Neural Network	-
dc.subject	Convolutional Neural Network	-
dc.subject	Photo stream	-
dc.subject	Story extraction	-
dc.subject	Coherence	-
dc.subject	Image captioning	-
dc.subject	Natural Language Processing	-
dc.subject.ddc	621	-
dc.title	Automatic Story Extraction for Photo Stream via Coherence Recurrent Convolutional Neural Network	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	Cesc Chunseong Park	-
dc.description.degree	Master	-
dc.citation.pages	42	-
dc.contributor.affiliation	공과대학 컴퓨터공학부	-
dc.date.awarded	2017-02	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Files in This Item:

000000140900.pdf 11.36 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share