Enhancing Depth Image using Unsupervised Overfit Training of Local Frame Set Registration

Abstract: Accurate depth acquisition using depth-sensing devices is fundamental to various computer vision applications such as 3d object recognition and scene understanding.
Recently, commercial RGB-depth (RGB-D) cameras have been widely used as depth sensors owing to their portable sizes and affordable prices. But depth images of most commercial RGB-D cameras contain heavy noise and undetected regions (i.e., missing values) caused by their lower-grade light sources and sensors. Recent deep-learning-based methods have been proposed to alleviate these problems. However, such methods typically require high-quality supervised depth datasets for training networks, which are difficult to obtain. In this dissertation, a novel method for generating high-quality depth images is presented to address the issue.

The main idea of the proposed framework is leveraging depth information from nearby view frames to reduce noise and recover missing values of a certain depth frame. Based on a sequentially scanned RGB-D dataset, the frames in a local spatial region are defined as a local frame set. Then, local frame set is aligned to a single depth frame by estimating relative motions of frames. An unsupervised learning-based registration method is employed for frame set alignment, which does not require any ground-truth dataset. To improve registration accuracy, registration parameters of the local frame set are trained by an overfit-training scheme. The final depth image is rendered by averaging the aligned frame set at the pixel-level to reduce noise and recover missing values.

Experimental results showed that the proposed method is superior to previously benchmarked depth generation methods based on the local frame set registration strategy. The method was evaluated by recovering a noise-added synthetic depth dataset, and verified that the method can capably retrieve the original ground-truth dataset compared to previous methods. Moreover, a constructed depth dataset was used to train a learning-based method and significantly outperformed state-of-the-art depth enhancement frameworks. The major advantage of this study is that high-quality depth images can be generated using only the RGB-D stream dataset to construct a new benchmark depth dataset.
컴퓨터 비전 분야에서 정확한 깊이 정보를 획득하는 것은 중요한 문제이다. 최근에는 상업용 RGB-깊이 (RGB-D) 카메라가 저렴한 가격과 휴대할 수 있는 크기로 인해 깊이를 지각하기 위한 장치로써 널리 사용되고 있다. 그러나 상업용 RGB-D 카메라의 깊이 영상은 저품질의 광원과 센서로 인해 노이즈와 검출되지 않은 영역들로 인해 품질이 떨어지는 문제가 있다. 최근 인공지능을 기반으로 한 깊이 영상의 품질을 높이기 위한 방법들이 각광받고 있지만, 이러한 방법들은 네트워크를 학습시키기 위한 고품질의 깊이 데이터 세트를 요구하므로 고품질의 깊이 영상을 만드는 것이 필수적이다.

본 논문에서는 고품질의 깊이 영상을 생성하는 방법을 제안한다. 제안하는 방식은 연속적으로 획득한 RGB-D 데이터 세트에서 특정 프레임의 노이즈와 빈 영역을 줄이기 위해 주변 프레임의 깊이 정보들을 활용하는 방식으로 이루어진다. 국소적인 영역 내의 프레임들을 로컬 프레임 세트로 정의하고, 프레임들의 상대적인 위치 정보를 추정하여 원하는 프레임에 정렬한다. 이 과정을 위해 별도의 정답 데이터 세트가 필요 없는 비지도 방식 포인트 세트 정합 기법을 활용한다. 이때 정합의 정확도를 높이기 위해 파라미터들은 로컬 프레임 세트 내에서 과적합 학습된다. 최종 깊이 영상은 노이즈와 빈 영역을 줄이기 위해 정렬된 프레임들의 화소 단위로 평균을 통해 획득한다.

노이즈를 추가한 합성 깊이 영상을 복구하는 실험을 통해 본래의 정답 영상을 회복하는 측면에서 기존의 기법들보다 더욱 뛰어난 결과를 나타냈다. 또한 구축된 데이터 세트를 학습 기반 방식에 적용하여 최신의 깊이 개선 방법들에 비해 우수한 성능을 보였다. 제안하는 방법을 통해 연속적으로 획득한 RGB-D 데이터 세트만을 사용해 새로운 표준 데이터 세트로 활용될 수 있는 고품질의 깊이 영상을 생성할 수 있다.

Language: eng

URI: https://hdl.handle.net/10371/193330

https://dcollection.snu.ac.kr/common/orgView/000000174280

Files in This Item:

000000174280.pdf 33.21 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Ph.D. / Sc.D._컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share