Video Question Answering with Spatio-Temporal Reasoning

Jang, Yunseok; Song, Yale; Kim, Chris Dongjoo; Yu, Youngjae; Kim, Youngjin; Kim, Gunhee; 김건희

doi:10.1007/s11263-019-01189-x

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Video Question Answering with Spatio-Temporal Reasoning

DC Field	Value	Language
dc.contributor.author	Jang, Yunseok	-
dc.contributor.author	Song, Yale	-
dc.contributor.author	Kim, Chris Dongjoo	-
dc.contributor.author	Yu, Youngjae	-
dc.contributor.author	Kim, Youngjin	-
dc.contributor.author	Kim, Gunhee	-
dc.creator	김건희	-
dc.date.accessioned	2020-01-23T07:42:26Z	-
dc.date.available	2020-04-05T07:42:26Z	-
dc.date.created	2020-01-17	-
dc.date.issued	2019-10	-
dc.identifier.citation	International Journal of Computer Vision, Vol.127 No.10, pp.1385-1412	-
dc.identifier.issn	0920-5691	-
dc.identifier.uri	https://hdl.handle.net/10371/163963	-
dc.description.abstract	Vision and language understanding has emerged as a subject undergoing intense study in Artificial Intelligence. Among many tasks in this line of research, visual question answering (VQA) has been one of the most successful ones, where the goal is to learn a model that understands visual content at region-level details and finds their associations with pairs of questions and answers in the natural language form. Despite the rapid progress in the past few years, most existing work in VQA have focused primarily on images. In this paper, we focus on extending VQA to the video domain and contribute to the literature in three important ways. First, we propose three new tasks designed specifically for video VQA, which require spatio-temporal reasoning from videos to answer questions correctly. Next, we introduce a new large-scale dataset for video VQA named TGIF-QA that extends existing VQA work with our new tasks. Finally, we propose a dual-LSTM based approach with both spatial and temporal attention and show its effectiveness over conventional VQA techniques through empirical evaluations.	-
dc.language	영어	-
dc.language.iso	ENG	en
dc.publisher	Kluwer Academic Publishers	-
dc.title	Video Question Answering with Spatio-Temporal Reasoning	-
dc.type	Article	-
dc.identifier.doi	10.1007/s11263-019-01189-x	-
dc.citation.journaltitle	International Journal of Computer Vision	-
dc.identifier.wosid	000485320300001	-
dc.identifier.scopusid	2-s2.0-85067799133	-
dc.description.srnd	OAIID:RECH_ACHV_DSTSH_NO:T201917027	-
dc.description.srnd	RECH_ACHV_FG:RR00200001	-
dc.description.srnd	ADJUST_YN:	-
dc.description.srnd	EMP_ID:A079841	-
dc.description.srnd	CITE_RATE:6.071	-
dc.description.srnd	DEPT_NM:컴퓨터공학부	-
dc.description.srnd	EMAIL:gunhee@snu.ac.kr	-
dc.description.srnd	SCOPUS_YN:Y	-
dc.citation.endpage	1412	-
dc.citation.number	10	-
dc.citation.startpage	1385	-
dc.citation.volume	127	-
dc.description.isOpenAccess	N	-
dc.contributor.affiliatedAuthor	Kim, Gunhee	-
dc.identifier.srnd	T201917027	-
dc.type.docType	Article	-
dc.description.journalClass	1	-
dc.subject.keywordAuthor	VQA	-
dc.subject.keywordAuthor	Spatio-temporal reasoning	-
dc.subject.keywordAuthor	Large-scale video QA dataset	-
dc.subject.keywordAuthor	Spatial and temporal attention	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Journal Papers (저널논문_컴퓨터공학부)

Files in This Item:: There are no files associated with this item.

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share