Publications

Detailed Information

A Variational Observation Model of 3D Multi-Object in 2D Single Scene for Semantic SLAM : Semantic SLAM을 위한 단일 2D scene에서의 변분법 기반 3D Multi-Object 관측 모델

DC Field Value Language
dc.contributor.advisor이범희-
dc.contributor.author유현우-
dc.date.accessioned2020-05-19T08:04:44Z-
dc.date.available2020-05-19T08:04:44Z-
dc.date.issued2020-
dc.identifier.other000000161007-
dc.identifier.urihttps://hdl.handle.net/10371/168046-
dc.identifier.urihttp://dcollection.snu.ac.kr/common/orgView/000000161007ko_KR
dc.description학위논문(박사)--서울대학교 대학원 :공과대학 전기·정보공학부,2020. 2. 이범희.-
dc.description.abstract본 학위논문에서는 기존의 geometric SLAM과 deep learning 기반의 semantic scene understanding을 하나로 이어 위치 추정과 지도 작성 및 환경 인지를 통합하기 위한 방법을 소개한다.
이를 위해, Bayesian inference를 위한 variational multi-object observation model을 제안한다.
object-oriented feature는 환경 인식이나 simultaneous localization and mapping (SLAM)에 유용하게 활용될 수 있다.
그러나 object의 single view 또는 3d shape가 다루기 힘든 분포를 따르므로, Bayesian inference에서 산술적 계산에 object feature을 그대로 적용하기가 힘들다.

최근 deep learning 기법의 등장으로 인해, 복잡한 분포를 따르는 data에 대해서도 non-linear regression이 가능하게 되었다.
따라서 object의 multi-view로부터 3D shape reconstruction을 수행하는 것을 넘어서, single view에서 full shape을 직접 추정하는 것이 가능해졌다.
또한 shape 뿐만 아니라 object의 orientation 등의 disentangled representation의 추정 기법들이 다수 제안되었다.

그러나 기존의 instance-level representation method들은 확률적 접근보다는 object의 image로부터 정확하게 다양한 표현들을 추정하는 데 초점을 맞췄기 때문에, 확률 모델이 필수적인 Bayesian inference에 그대로 적용하기에는 한계가 있다.
따라서 본 논문에서는 variational inference를 이용한 multi-object의 observation model의 근사 방법을 제안한다.
neural network의 advantage를 exploit하기 위해, deep generative model인 variational auto-encoder (VAE)를 활용한다.
object의 disentangled representation을 위해, tractable distribution을 따르도록 가정한 잠재 변수를 활용한다.
제안하는 모델의 목적은 data association 또는 SLAM과 같은 Bayesian inference이므로, 잠재 변수들이 instance-level에서 구분되는 분포를 따르도록 학습한다.

학습한 VAE는 실제 수행 시에는 encoder만 활용되어 잠재 변수를 얻는다.
제안하는 observation model을 확률적 SLAM 및 Maximul Likelihood Estimation (MLE)에 적용하면, 잠재 변수가 object-oriented feature를 대체하는 형태로 Bayesian inference를 수행할 수 있다.
제안하는 기법을 검증하기 위해, 학습한 데이터와는 상이한 도로주행 환경에서 확률적 semantic SLAM을 수행하였다.

로봇의 인지 과정은 기본적으로 다른 task를 수행하기 위해 필요한 과정이므로, 실시간으로 수행되어야 한다. 제안하는 기법의 encoder에는 제한이 없으므로, 기존의 어떠한 real-time mutli-object detector 구조를 활용하여도 무방하다.
제안하는 model은 기본적으로 auto-encoder 구조이므로, decoder를 함께 활용하면 실시간으로 수행되는 object의 shape reconstruction 및 orientation 추정에 활용할 수 있다.
-
dc.description.abstractIn this dissertation, I present an approximation method for the observation model of 3D multi-object from 2D single shot. To achieve semantic scene understandings for mobile robots, various studies have exploited object-oriented features. However, due to the nonlinearity of 3D shape that follow the intractable distributions, it is challenging to adopt the object-oriented features for the probabilistic data association of simultaneous localization and mapping (SLAM) by using Bayesian inference. This limitation leads the conventional SLAM to be split into two parts that give incomplete solutions; front-end for data association, and back-end for pose estimation of robot trajectory and feature location. Even though previous instance-level single- or multi-object understandings mainly have focused on the disentangled representations, it is still hard to combine object-oriented features and SLAM since these studies have lacked the consideration of probability distributions. In order to bridge the gap between traditional geometric SLAM and semantic scene understanding, I propose a method to approximate the Bayesian observation model of scene-level 3D multi-object understanding with variational inference. By exploiting variational auto-encoder (VAE), the method estimate latent variables from the entire scene for multi-object. Since the purpose of the proposed model is probabilistic SLAM with data association using Bayesian inference, proposed networks are trained to enforce the latent variables follow the distributions that have distinct clusters to the category- or instance-level. Therefore, the encoded latent variables follow tractable distributions and concurrently imply 3D full shape and pose. To perform the complete probabilistic SLAM considering object-oriented data association, the approximated observation model can be easily adopted to probabilistic inference by replacing objectoriented features with latent variables. Since the latent variables follow the tractable Gaussian distributions, numerical analysis can be achieved along with other Gaussian random variables such as the pose of the robot and objects. This optimization process for probabilistic SLAM is formulated with Expectation and Maximization (EM). The recognition process should be performed before pose and feature optimization for trajectory and map estimation, thus the proposed system should be performed in real-time. Since the proposed method is not bound to the encoder structure, any existing real-time multi-object detector structure can be used as an encoder. For shape reconstruction and orientation estimation of multi-object, the proposed network can be exploited followed by decoding process, as the network is basically an auto-encoder.-
dc.description.tableofcontents1 Introduction 1
1.1 Background and Motivation 1
1.2 Contribution 4
1.3 Organization 5

2 Literature Review 7
2.1 Single Object Understanding 7
2.2 Multi-object Understanding 10
2.3 Data Association and SLAM 11

3 Variational Observation Model with Category and 3D Shape 13
3.1 Introduction 14
3.2 Graphical Models for Concepts of 3D Object 17
3.3 Object Shape Inference using VAE 18
3.3.1 Variational Lower Bound for Generative Model 18
3.3.2 Model Architecture 22
3.4 Experiments 22
3.4.1 Data Augmentation 22
3.4.2 Object Classification 24
3.4.3 Concept Inference and Shape Retrieval 24
3.5 Summary 27

4 Variational Observation Model with Voxelized Single View 28
4.1 Introduction 29
4.2 Related Work 31
4.3 Graphical Models for Likelihood of 3D Object 34
4.4 Variational Latent Variables and Semantic Features 36
4.4.1 Variational Latent Variables and Semantic SLAM with Data Association 36
4.4.2 MLE with Variational Latent Variables 38
4.5 Training Details 38
4.5.1 Data Augmentation 38
4.5.2 Loss Function 39
4.5.3 Training Networks 42
4.6 Experiments 42
4.6.1 Variational Latent Variable and Object Shape 42
4.6.2 Classification and Reconstruction 43
4.7 Summary 45
4.8 Appendix Variational Lower Bound and EM Algorithm 46
4.8.1 Variational Lower Bound for Expectation Step 47
4.8.2 Variational Lower Bound for Maximization Step 50
4.9 Appendix II : MLE with Variational Latent Variables 50

5 Variational Observation Model with RGB Image and Probabilistic Semantic SLAM 52
5.1 Introduction 53
5.2 Related Work 55
5.3 Observation Model of Object with 3D Shape and Orientation 57
5.4 Variational Latent Features and Semantic SLAM 60
5.4.1 Variational Features of Shape and Orientation for EM Formulation 60
5.4.2 Pose and Feature Optimization of 3D Object 61
5.5 Training Details 63
5.5.1 Training Datasets and Data Augmentation 63
5.5.2 Loss Function 63
5.5.3 Networks 66
5.6 Experiments 66
5.6.1 Object Detection and Feature Encoding 66
5.6.2 Pose and Feature Graph Optimization 67
5.6.3 Optimization Results for SLAM 67
5.7 Summary 69

6 Variational Multi-object Observation Model for Single Shot 71
6.1 Introduction 72
6.2 Related work 75
6.3 Method 78
6.4 Implementation 83
6.5 Training details 85
6.5.1 Loss Functions 85
6.5.2 Datasets and Trainings 85
6.6 Experiments 87
6.6.1 Object Pose Estimation . 88
6.6.2 Object Observation and Latent Space 90
6.6.3 Object 3D Shape Reconstruction 90
6.7 Summary 91
6.8 Appendix 92
6.8.1 Object-oriented Probabilistic Semantic SLAM with Approximated Observation Model 92
6.8.2 MLE with Approximated Observation Model 100

7 Conclusion 107
-
dc.language.isoeng-
dc.publisher서울대학교 대학원-
dc.subject.ddc621.3-
dc.titleA Variational Observation Model of 3D Multi-Object in 2D Single Scene for Semantic SLAM-
dc.title.alternativeSemantic SLAM을 위한 단일 2D scene에서의 변분법 기반 3D Multi-Object 관측 모델-
dc.typeThesis-
dc.typeDissertation-
dc.contributor.AlternativeAuthorYu, Hyeonwoo-
dc.contributor.department공과대학 전기·정보공학부-
dc.description.degreeDoctor-
dc.date.awarded2020-02-
dc.identifier.uciI804:11032-000000161007-
dc.identifier.holdings000000000042▲000000000044▲000000161007▲-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share