Smart Random Erasing for Image Captioning

김연우

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Smart Random Erasing for Image Captioning : 이미지 캡셔닝을 위한 스마트 랜덤이레이징 데이터 증강 기법

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김연우

Advisor: 이상구

Issue Date: 2021-02

Publisher: 서울대학교 대학원

Keywords: Image captioning ; Data augmentation ; Random erasing ; Cutout ; Reinforcement learning

Description: 학위논문 (석사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2021. 2. 이상구.

Abstract: Image captioning is a task in machine learning that aims to automatically generate a natural language description of a given image. It is considered a crucial task because of its broad applications and the fact that it is a bridge between computer vision and natural language processing.

However, image-caption paired dataset is restricted in both quantity and diversity, which is essential when training a supervised model. Various approaches have been made including semi-supervised and unsupervised learning, but the result is still far from that of supervised approach. While data augmentation can be the solution for data deficiency in the field, existing data augmentation techniques are often designed for image classification tasks and are not suitable for image captioning tasks.

Thus, in this paper, we introduce a new data augmentation technique designed for image captioning. The proposed Smart Random Erasing (SRE) is inspired from the Random Erasing augmentation technique, and it complements the drawbacks of Random Erasing to achieve the best performance boost when applied to image captioning. We also derive idea from AutoAugment to automatically search optimal hyperparameters via reinforcement learning. This study shows better results than the traditional augmentation techniques and the state-of-the-art augmentation technique RandAugment when applied to image captioning tasks.
이미지 캡셔닝이란 입력이 이미지로 주어졌을 때, 이미지에 대한 자연어 묘사를 생성하는 머신러닝의 한 과제이다. 이미지 캡셔닝은 시각장애인을 위한 보조자막 생성, 캡션 생성을 통한 검색엔진 성능 향상 등 방대한 어플리케이션을 가질 뿐만 아니라 자연어 처리와 컴퓨터 비전 분야를 연결하는 과제로서 중요성을 지니고 있다.

하지만, 이미지 캡셔닝 모델을 학습하는데 필요한 이미지-캡션의 쌍으로된 데이터셋은 매우 한정되어 있고, 현존하는 데이터셋들 또한 생성되는 문장들의 다양성이 부족하며 이미지 분야도 매우 제한적이다. 이를 해결하기 위해 최근엔 비지도 학습 모델의 연구도 진행되었으나, 현재로서는 지도 학습 모델의 성능을 따라가기엔 아직 한참 부족하다.

데이터 부족 문제를 완화하기 위한 또 다른 방법으로는 데이터 증강 기법이 있다. 최근 이미지 데이터 증강 기법은 AutoAugment, RandAugment 등 활발하게 연구가 진행되고 있으나, 대부분의 연구들이 이미지 분류 문제를 위한 기법들이고, 이를 그대로 이미지 캡셔닝 문제에 적용하기엔 어려움이 있다.

따라서 본 연구에서는 실험을 통해 기존의 데이터 증강 기법이 문제, 모델, 데이터셋에 따라 성능이 매우 달라진다는 것을 확인한다. 그리고 기존의 데이터 증강 기법을 발전시켜 이미지 캡셔닝 문제에 적합한 새로운 기법을 개발하고, 해당 기법의 성능을 실험적으로 검증한다.

Language: eng

URI: https://hdl.handle.net/10371/175430

https://dcollection.snu.ac.kr/common/orgView/000000164513

Files in This Item:

000000164513.pdf 0.84 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share