Detecting Parts of Speech from Image for Caption Generation

강필구

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Detecting Parts of Speech from Image for Caption Generation : 영어 품사 정보를 활용한 이미지 캡션 생성 모델

DC Field	Value	Language
dc.contributor.advisor	김형주	-
dc.contributor.author	강필구	-
dc.date.accessioned	2020-05-07T03:46:20Z	-
dc.date.available	2020-05-07T03:46:20Z	-
dc.date.issued	2020	-
dc.identifier.other	000000159883	-
dc.identifier.uri	http://dcollection.snu.ac.kr/common/orgView/000000159883	ko_KR
dc.description	학위논문(석사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 김형주.	-
dc.description.abstract	The capability to generate a description about the content of an image is becoming more important with the integration of smart devices and reliance on AI into our daily lives. In this paper, we propose a novel approach that utilizes multiple CNN models that have been specially trained to detect features related to the parts of speech (PoS) such as noun, verb, pronoun, adjective, preposition and conjunction. Using the PoS based CNN models, we extract features that the language model uses to generate high quality captions. We validate our finds by using Flickr8k, Flickr30k and MSCOCO dataset through multiple human surveys and several popular automatic text metrics.	-
dc.description.abstract	스마트 기기와 일상 생활에서의 인공 지능에 대한 의존도가 증가해가면서, 이미지를 스스로 설명하는 기술의 중요성이 점점 더 증가하고 있다. 본 논문에서는 CNN 모델을 명사, 형용사, 전치사와 같은 영어 품사별로 나누어 학습하여 품사 별로 구분되는 특징을 학습하고, 학습된 결과를 활용하여 이미지를 설명할 수 있는 문장을 생성하는 방법을 제안한다. 품사 별로 학습된 CNN 모델에서는 품사별로 구별되는 시각적 특징 벡터들을 추출하고, 추출한 특징 벡터들을 합성하여 언어 모델에서 좋은 설명문을 생성하는데에 활용된다. 본 논문에서는 해당 분야에서 널리 사용되는 있는 Flickr8k, Flickr30k 그리고 MS-COCO 데이터 셋에 대한 실험을 통해 제안하는 모델의 우수성을 검증하였다. 또한, 사람들을 대상으로 한 설문 조사를 진행하여 제안한 모델에서 사람이 이해하기에 충분한 문장을 생성하는 것을 확인하였다.	-
dc.description.tableofcontents	Chapter 1. Introduction 1 Chapter 2. Related Work 5 Chapter 3. Proposed Model 8 3.1. Encoder 9 3.1.1. Preparing PoS Dataset 9 3.1.2. Generating PoS Detecting CNN 10 3.2. Decoder 17 3.3. Training 19 3.3.1 Combine PoS CNN Model Output 20 3.3.2 Serving PoS CNN Output to RNN 22 3.3.3 End-To-End Loss Optimization 25 3.4. Inference 26 Chapter 4. Experiment 27 4.1. Environment 27 4.2. Data 27 4.3. Implementation 28 4.4. Evaluation Metrics 29 4.5. Results 32 4.5.1. Evaluate Caption Quality 32 4.5.2. Ground Truth Comparison 39 4.5.3. Automatic Evaluation 42 4.6. Discussion 46 Chapter 5. Conclusion 48 Bibliography 49 Appendix 54 초록 60	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject.ddc	621.39	-
dc.title	Detecting Parts of Speech from Image for Caption Generation	-
dc.title.alternative	영어 품사 정보를 활용한 이미지 캡션 생성 모델	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Kang Phil Goo	-
dc.contributor.department	공과대학 컴퓨터공학부	-
dc.description.degree	Master	-
dc.date.awarded	2020-02	-
dc.identifier.uci	I804:11032-000000159883	-
dc.identifier.holdings	000000000042▲000000000044▲000000159883▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Files in This Item:

000000159883.pdf 1.70 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share