3D Reconstruction, Weakly-Supervised Learning, and Supervised Learning Methods for 3D Human Pose Estimation

박성헌

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

3D Reconstruction, Weakly-Supervised Learning, and Supervised Learning Methods for 3D Human Pose Estimation : 3차원 사람 자세 추정을 위한 3차원 복원, 약지도학습, 지도학습 방법

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 박성헌

Advisor: 곽노준

Major: 융합과학기술대학원 융합과학부(지능형융합시스템전공)

Issue Date: 2019-02

Publisher: 서울대학교 대학원

Description: 학위논문 (박사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부(지능형융합시스템전공), 2019. 2. 곽노준.

Abstract: Estimating human poses from images is one of the fundamental tasks in computer vision, which leads to lots of applications such as action recognition, human-computer interaction, and virtual reality. Especially, estimating 3D human poses from 2D inputs is a challenging problem since it is inherently under-constrained. In addition, obtaining 3D ground truth data for human poses is only possible under the limited and restricted environments. In this dissertation, 3D human pose estimation is studied in different aspects focusing on various types of the availability of the data. To this end, three different methods to retrieve 3D human poses from 2D observations or from RGB images---algorithms of 3D reconstruction, weakly-supervised learning, and supervised learning---are proposed.

First, a non-rigid structure from motion (NRSfM) algorithm that reconstructs 3D structures of non-rigid objects such as human bodies from 2D observations is proposed. In the proposed framework which is named as Procrustean Regression, the 3D shapes are regularized based on their aligned shapes. We show that the cost function of the Procrustean Regression can be casted into an unconstrained problem or a problem with simple bound constraints, which can be efficiently solved by existing gradient descent solvers. This framework can be easily integrated with numerous existing models and assumptions, which makes it more practical for various real situations. The experimental results show that the proposed method gives competitive result to the state-of-the-art methods for orthographic projection with much less time complexity and memory requirement, and outperforms the existing methods for perspective projection.

Second, a weakly-supervised learning method that is capable of learning 3D structures when only 2D ground truth data is available as a training set is presented. Extending the Procrustean Regression framework, we suggest Procrustean Regression Network, a learning method that trains neural networks to learn 3D structures using training data with 2D ground truths. This is the first attempt that directly integrates an NRSfM algorithm into neural network training. The cost function that contains a low-rank function is also firstly used as a cost function of neural networks that reconstructs 3D shapes. During the test phase, 3D structures of human bodies can be obtained via a feed-forward operation, which enables the framework to have much faster inference time compared to the 3D reconstruction algorithms.

Third, a supervised learning method that infers 3D poses from 2D inputs using neural networks is suggested. The method exploits a relational unit which captures the relations between different body parts. In the method, each pair of different body parts generates relational features, and the average of the features from all the pairs are used for 3D pose estimation. We also suggest a dropout method called relational dropout, which can be used in relational modules to impose robustness to the occlusions. The experimental results validate that the performance of the proposed algorithm does not degrade much when missing points exist while maintaining state-of-the-art performance when every point is visible.
RGB 영상에서의 사람 자세 추정 방법은 컴퓨터 비전 분야에서 중요하며 여러 어플리케이션의 기본이 되는 기술이다. 사람 자세 추정은 동작 인식, 인간-컴퓨터 상호작용, 가상 현실, 증강 현실 등 광범위한 분야에서 기반 기술로 사용될 수 있다. 특히, 2차원 입력으로부터 3차원 사람 자세를 추정하는 문제는 무수히 많은 해를 가질 수 있는 문제이기 때문에 풀기 어려운 문제로 알려져 있다. 또한, 3차원 실제 데이터의 습득은 모션캡처 스튜디오 등 제한된 환경하에서만 가능하기 때문에 얻을 수 있는 데이터의 양이 한정적이다. 본 논문에서는, 얻을 수 있는 학습 데이터의 종류에 따라 여러 방면으로 3차원 사람 자세를 추정하는 방법을 연구하였다. 구체적으로, 2차원 관측값 또는 RGB 영상을 바탕으로 3차원 사람 자세를 추정, 복원하는 세 가지 방법--3차원 복원, 약지도학습, 지도학습--을 제시하였다.

첫 번째로, 사람의 신체와 같이 비정형 객체의 2차원 관측값으로부터 3차원 구조를 복원하는 비정형 움직임 기반 구조 (Non-rigid structure from motion) 알고리즘을 제안하였다. 프로크루스테스 회귀 (Procrustean regression)으로 명명한 제안된 프레임워크에서, 3차원 형태들은 그들의 정렬된 형태에 대한 함수로 정규화된다. 제안된 프로크루스테스 회귀의 비용 함수는 3차원 형태 정렬과 관련된 제약을 비용 함수에 포함시켜 경사 하강법을 이용한 최적화가 가능하다. 제안된 방법은 다양한 모델과 가정을 포함시킬 수 있어 실용적이고 유연한 프레임워크이다. 다양한 실험을 통해 제안된 방법은 세계 최고 수준의 방법들과 비교해 유사한 성능을 보이면서, 동시에 시간, 공간 복잡도 면에서 기존 방법에 비해 우수함을 보였다.

두 번째로 제안된 방법은, 2차원 학습 데이터만 주어졌을 때 2차원 입력에서 3차원 구조를 복원하는 약지도학습 방법이다. 프로크루스테스 회귀 신경망 (Procrustean regression network)로 명명한 제안된 학습 방법은 신경망 또는 컨볼루션 신경망을 통해 사람의 2차원 자세로부터 3차원 자세를 추정하는 방법을 학습한다. 프로크루스테스 회귀에 사용된 비용 함수를 수정하여 신경망을 학습시키는 본 방법은, 비정형 움직임 기반 구조에 사용된 비용 함수를 신경망 학습에 적용한 최초의 시도이다. 또한 비용함수에 사용된 저계수 함수 (low-rank function)를 신경망 학습에 처음으로 사용하였다. 테스트 데이터에 대해서 3차원 사람 자세는 신경망의 전방전달(feed forward)연산에 의해 얻어지므로, 3차원 복원 방법에 비해 훨씬 빠른 3차원 자세 추정이 가능하다.

마지막으로, 신경망을 이용해 2차원 입력으로부터 3차원 사람 자세를 추정하는 지도학습 방법을 제시하였다. 본 방법은 관계 신경망 모듈(relational modules)을 활용해 신체의 다른 부위간의 관계를 학습한다. 서로 다른 부위의 쌍마다 관계 특징을 추출해 모든 관계 특징의 평균을 최종 3차원 자세 추정에 사용한다. 또한 관계형 드랍아웃(relational dropout)이라는 새로운 학습 방법을 제시해 가려짐에 의해 나타나지 않은 2차원 관측값이 있는 상황에서, 강인하게 동작할 수 있는 3차원 자세 추정 방법을 제시하였다. 실험을 통해 해당 방법이 2차원 관측값이 일부만 주어진 상황에서도 큰 성능 하락이 없이 효과적으로 3차원 자세를 추정함을 증명하였다.

Language: eng

URI: https://hdl.handle.net/10371/152572

Files in This Item:

000000155042.pdf 6.10 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Transdisciplinary Studies(융합과학부)
  - Theses (Ph.D. / Sc.D._융합과학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share