Publications

Detailed Information

Probabilistic 3D Human Pose Recovery and Its Application to Action Recognition : 확률적인 3차원 자세 복원과 행동인식

DC Field Value Language
dc.contributor.advisor오성회-
dc.contributor.authorJungchan Cho-
dc.date.accessioned2017-07-13T07:13:27Z-
dc.date.available2017-07-13T07:13:27Z-
dc.date.issued2016-02-
dc.identifier.other000000132492-
dc.identifier.urihttps://hdl.handle.net/10371/119157-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 오성회.-
dc.description.abstractThese days, computer vision technology becomes popular and plays an important role in intelligent systems, such as augment reality, video and image analysis, and to name a few. Although cost effective depth cameras, like a Microsoft Kinect, have recently developed, most computer vision algorithms assume that observations are obtained from RGB cameras, which make 2D observations. If, somehow, we can estimate 3D information from 2D observations, it might give better solutions for many computer vision problems.

In this dissertation, we focus on estimating 3D information from 2D observations, which is well known as non-rigid structure from motion (NRSfM).
More formally, NRSfM finds the three dimensional structure of an object by analyzing image streams with the assumption that an object lies in a low-dimensional space. However, a human body for long periods of time can have complex shape variations and it makes a challenging problem for NRSfM due to its increased degree of freedom. In order to handle complex shape variations, we propose a Procrustean normal distribution mixture model (PNDMM) by extending a recently proposed Procrustean normal distribution (PND), which captures the distribution of non-rigid variations of an object by excluding the effects of rigid motion.
Unlike existing methods which use a single model to solve an NRSfM problem, the proposed PNDMM decomposes complex shape variations into a collection of simpler ones, thereby model learning can be more tractable and accurate. We perform experiments showing that the proposed method outperforms existing methods on highly complex and long human motion sequences.

In addition, we extend the PNDMM to a single view 3D human pose estimation problem. While recovering a 3D structure of a human body from an image is important, it is a highly ambiguous problem due to the deformation of an articulated human body. Moreover, before estimating a 3D human pose from a 2D human pose, it is important to obtain an accurate 2D human pose. In order to address inaccuracy of 2D pose estimation on a single image and 3D human pose ambiguities, we estimate multiple 2D and 3D human pose candidates and select the best one which can be explained by a 2D human pose detector and a 3D shape model. We also introduce a model transformation which is incorporated into the 3D shape prior model, such that the proposed method can be applied to a novel test image.
Experimental results show that the proposed method can provide good 3D reconstruction results when tested on a novel test image, despite inaccuracies of 2D part detections and 3D shape ambiguities.

Finally, we handle an action recognition problem from a video clip. Current studies show that high-level features obtained from estimated 2D human poses enable action recognition performance beyond current state-of-the-art methods using low- and mid-level features based on appearance and motion, despite inaccuracy of human pose estimation. Based on these findings, we propose an action recognition method using estimated 3D human pose information since the proposed PNDMM is able to reconstruct 3D shapes from 2D shapes. Experimental results show that 3D pose based descriptors are better than 2D pose based descriptors for action recognition, regardless of classification methods. Considering the fact that we use simple 3D pose descriptors based on a 3D shape model which is learned from 2D shapes, results reported in this dissertation are promising and obtaining accurate 3D information from 2D observations is still an important research issue for reliable computer vision systems.
-
dc.description.tableofcontentsChapter 1 Introduction 1
1.1 Motivation 1
1.2 Research Issues 4
1.3 Organization of the Dissertation 6

Chapter 2 Preliminary 9
2.1 Generalized Procrustes Analysis (GPA) 11
2.2 EM-GPA Algorithm 12
2.2.1 Objective function 12
2.2.2 E-step 15
2.2.3 M-step 16
2.3 Implementation Considerations for EM-GPA 18
2.3.1 Preprocessing stage 18
2.3.2 Small update rate for the covariance matrix 20
2.4 Experiments 21
2.4.1 Shape alignment with the missing information 23
2.4.2 3D shape modeling 24
2.4.3 2D+3D active appearance models 28
2.5 Chapter Summary and Discussion 32

Chapter 3 Procrustean Normal Distribution Mixture Model 33
3.1 Non-Rigid Structure from Motion 35
3.2 Procrustean Normal Distribution (PND) 38
3.3 PND Mixture Model 41
3.4 Learning a PNDMM 43
3.4.1 E-step 44
3.4.2 M-step 46
3.5 Learning an Adaptive PNDMM 48
3.6 Experiments 50
3.6.1 Experimental setup 50
3.6.2 CMU Mocap database 53
3.6.3 UMPM dataset 69
3.6.4 Simple and short motions 74
3.6.5 Real sequence - qualitative representation 77
3.7 Chapter Summary 78

Chapter 4 Recovering a 3D Human Pose from a Novel Image 83
4.1 Single View 3D Human Pose Estimation 85
4.2 Candidate Generation 87
4.2.1 Initial pose generation 87
4.2.2 Part recombination 88
4.3 3D Shape Prior Model 89
4.3.1 Procrustean mixture model learning 89
4.3.2 Procrustean mixture model fitting 91
4.4 Model Transformation 92
4.4.1 Model normalization 92
4.4.2 Model adaptation 95
4.5 Result Selection 96
4.6 Experiments 98
4.6.1 Implementation details 98
4.6.2 Evaluation of the joint 2D and 3D pose estimation 99
4.6.3 Evaluation of the 2D pose estimation 104
4.6.4 Evaluation of the 3D pose estimation 106
4.7 Chapter Summary 108

Chapter 5 Application to Action Recognition 109
5.1 Appearance and Motion Based Descriptors 112
5.2 2D Pose Based Descriptors 113
5.3 Bag-of-Features with a Multiple Kernel Method 114
5.4 Classification - Kernel Group Sparse Representation 115
5.4.1 Group sparse representation for classification 116
5.4.2 Kernel group sparse (KGS) representation for classification 118
5.5 Experiment on sub-JHMDB Dataset 120
5.5.1 Experimental setup 120
5.5.2 3D pose based descriptor 122
5.5.3 Experimental results 123
5.6 Chapter Summary 129

Chapter 6 Conclusion and Future Work 131

Appendices 135
A Proof of Propositions in Chapter 2 137
A.1 Proof of Proposition 1 137
A.2 Proof of Proposition 3 138
A.3 Proof of Proposition 4 139
B Calculation of p(XijDi
-
dc.description.tableofcontentsi) in Chapter 3 141
B.1 Without the Dirac-delta term 141
B.2 With the Dirac-delta term 142
C Procrustean Mixture Model Learning and Fitting in Chapter 4 145
C.1 Procrustean Mixture Model Learning 145
C.2 Procrustean Mixture Model Fitting 147

Bibliography 153

초 록 167
-
dc.formatapplication/pdf-
dc.format.extent11362647 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject3D Shape Recovery-
dc.subjectNon-Rigid Structure from Motion-
dc.subject3D Human Pose Estimation-
dc.subjectAction Recognition-
dc.subject.ddc621-
dc.titleProbabilistic 3D Human Pose Recovery and Its Application to Action Recognition-
dc.title.alternative확률적인 3차원 자세 복원과 행동인식-
dc.typeThesis-
dc.contributor.AlternativeAuthor조정찬-
dc.description.degreeDoctor-
dc.citation.pagesviii, 168-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2016-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share