Publications

Detailed Information

Deep Hyperentworks for Learning from Nonstationary Multimodal Data : 동적 멀티모달 데이터 학습을 위한 심층 하이퍼네트워크

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

하정우

Advisor
장병탁
Major
공과대학 전기·컴퓨터공학부
Issue Date
2015-02
Publisher
서울대학교 대학원
Keywords
Deep hypernetworkHigher-order graphical ModelNonstationary multmodal dataMultimodal concept learningStochastic hypergraph constructionIncremental learningVision-language translation
Description
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 장병탁.
Abstract
Recent advancements in information communication technology has led the explosive increase of data. Dissimilar to traditional data which are structured and unimodal, in particular, the characteristics of recent data generated from dynamic environments are summarized as
high-dimensionality, multimodality, and structurelessness as well as huge-scale size. The learning from non-stationary multimodal data is essential for solving many difficult problems in artificial intelligence. However, despite many successful reports, existing machine learning methods have mainly focused on solving practical
problems represented by large-scaled but static databases, such as image classification, tagging, and retrieval.

Hypernetworks are a probabilistic graphical model representing empirical distribution, using a hypergraph structure that is a large collection of many hyperedges encoding the associations among variables. This representation allows the model to be suitable for characterizing the complex relationships between features with a population of building blocks. However, since a hypernetwork is represented by a huge combinatorial feature space, the model requires a large number of hyperedges for handling the multimodal large-scale data and thus faces the scalability problem.

In this dissertation, we propose a deep architecture of
hypernetworks for dealing with the scalability issue for learning from multimodal data with non-stationary properties such as videos, i.e., deep hypernetworks. Deep hypernetworks handle the issues through the abstraction at multiple levels using a hierarchy of multiple hypergraphs. We use a stochastic method based on
Monte-Carlo simulation, a graph MC, for efficiently constructing hypergraphs representing the empirical distribution of the observed data. The structure of a deep hypernetwork continuously changes as the learning proceeds, and this flexibility is contrasted to other
deep learning models. The proposed model incrementally learns from the data, thus handling the nonstationary properties such as concept drift. The abstract representations in the learned models play roles
of multimodal knowledge on data, which are used for the
content-aware crossmodal transformation including vision-language conversion. We view the vision-language conversion as a machine translation, and thus formulate the vision-language translation in terms of the statistical machine translation. Since the knowledge on the video stories are used for translation, we call this story-aware
vision-language translation.

We evaluate deep hypernetworks on large-scale vision-language multimodal data including benmarking datasets and cartoon video series. The experimental results show the deep hypernetworks effectively represent visual-linguistic information abstracted at multiple levels of the data contents as well as the associations between vision and language. We explain how the introduction of a hierarchy deals with the scalability and non-stationary properties. In addition, we present the story-aware vision-language translation on cartoon videos by generating scene images from sentences and descriptive subtitles from scene images. Furthermore, we discuss the
meaning of our model for lifelong learning and the improvement direction for achieving human-level artificial intelligence.
Language
English
URI
https://hdl.handle.net/10371/119057
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share