Neural Reasoning with Structured Multimodal Knowledge Representation Learning

Abstract: In recent decades, tremendous progress has been made in the learning ability of artificial intelligence systems based on deep neural networks. Many problems have been explored and resolved by increasing the scale of neural models, however problems involving issues such as machine reasoning remain highly challenging, and the reasoning ability of neural networks has remained unexplored.

In this dissertation, I consider neural reasoning toward human-level reasoning. I consider learning as an integral part of the inference process and suggest that learning and reasoning should be studied together. More specifically, the goal of this dissertation is summarized as follows. Given large-scale multimodal sequential data, I aim to devise neural reasoning approaches with structured multimodal knowledge representation learning. I also suggest evaluation criteria for machine reasoning and conduct a case study comparing the reasoning ability of neural models with that of humans.

To this end, this dissertation is organized into three parts. First, I propose a neural reasoning architecture designed to perform interpretable inferences based on observation and accumulated knowledge. Secondly, I introduce neural reasoning approaches based on the proposed neural reasoning architecture. In particular, I propose i) compositional reasoning based on factual knowledge for visual question answering, ii) spatiotemporal reasoning for procedural knowledge in unsupervised procedure learning, and iii) chain of reasoning based on object conceptual knowledge for goal-oriented visual dialog. Finally, to evaluate the reasoning ability of AI compared to humans at different developmental stages, I propose a hierarchical reasoning map and design a video reasoning test. Based on this analysis of the video reasoning test, I discuss the strengths and weaknesses of the existing neural reasoning capabilities and provide a direction for future research on neural reasoning for human-level reasoning.
최근 수년간 심층 신경망을 기반으로 한 인공지능의 표현 학습 능력이 크게 발전하였다. 특히 심층 신경망을 깊고 복잡하게 쌓음으로서 인식을 요구하는 많은 문제들이 탐색되고 해결되었다. 하지만, 그럼에도 불구하고 복잡한 추론을 필요로 하는 문제는 여전히 어려운 문제로 여겨지고 있으며, 심층 신경망을 기반으로 한 인공지능의 추론 능력 또한 탐구되지 않은 상태로 남아있다.

본 논문에서는 인간 수준의 추론을 목표로 기계의 추론 능력을 탐구한다. 특히, 본 논문에서는 심층 신경망 기반의 표현 학습 능력을 활용하여 추론을 수행하는 것을 목표로 하며, 따라서 표현 학습과 추론이 함께 연구되어야함을 강조한다. 보다 구체적으로 본 논문에서는 대규모 멀티모달 순차 데이터가 주어졌을 때, 구조화된 멀티모달 지식 표현 학습을 통하여 신경 추론 기법을 고안하는 것을 목표로 한다. 또한, 기계 추론에 대한 평가 기준을 제시하고 신경망 기반의 추론 능력과 인간의 추론 능력을 비교하는 사례 연구를 수행한다.

본 논문은 크게 세 부분으로 구성된다. 첫째, 새로운 관측과 축적된 지식을 기반으로 해석 가능한 추론을 수행하도록 설계된 신경 추론 아키텍쳐를 제안한다. 둘째, 제안된 신경 추론 아키텍쳐의 틀을 기반으로 세 가지 신경 추론 접근법을 소개한다. 특히, i) 시각적 질의응답을 위해 사실적 지식을 바탕으로 한 구성적 추론 기법, ii) 비지도 절차 학습에서 절차적 지식을 구성하기 위한 시공간 추론 기법, iii) 목표 지향적 시각 대화를 위한 객체 개념적 지식의 구성 및 구성된 지식을 바탕으로 한 반복 추론 기법을 제안한다. 셋째, 다양한 발달 단계의 사람과 비교하여 인공지능의 추론 능력을 엄밀하게 평가하기위해 계층적 추론 지도를 제안하고 이를 바탕으로 한 영상 추론 테스트를 설계한다. 최종적으로 영상 추론 테스트의 분석을 통해 현재 심층 신경망 기반의 신경 추론 능력의 장단점을 논의하고 향후 인간 수준의 추론 능력으로 나아가기 위한 연구 방향을 제시한다.

Language: eng

URI: https://hdl.handle.net/10371/193343

https://dcollection.snu.ac.kr/common/orgView/000000176954

Files in This Item:

000000176954.pdf 111.20 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share