PoseDiff: Pose-conditioned Diffusion Model for Unbounded Scene Synthesis from Sparse Inputs

이서영

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

PoseDiff: Pose-conditioned Diffusion Model for Unbounded Scene Synthesis from Sparse Inputs : 소수의 이미지로부터 제한없는 장면을 합성하는 포즈 기반 확산 모델

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 이서영

Advisor: 이준석

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: Novel View Synthesis ; Diffusion Model ; Camera Pose ; Unbounded Scene ; Sparse Inputs

Description: 학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2023. 8. 이준석.

Abstract: Novel view synthesis has been heavily driven by NeRF-based models, but these models often hold limitations with the requirement of dense coverage of input views and expensive computations. NeRF models designed for scenarios with a few sparse input views face difficulty in being generalizable to complex or unbounded scenes, where multiple scene content can be at any distance from a multi-directional camera, and thus generate unnatural and low quality images with blurry or floating artifacts. To accommodate the lack of dense information in sparse view scenarios and the computational burden of NeRF-based models in novel view synthesis, our approach adopts diffusion models. In this paper, we present PoseDiff, which combines the fast and plausible generation ability of diffusion models and 3D-aware view consistency of pose parameters from NeRF-based models. Specifically, PoseDiff is a multimodal pose-conditioned diffusion model applicable for novel view synthesis of sparse view unbounded scenes as well as bounded or forward-facing scenes. PoseDiff renders plausible novel views for given pose parameters while maintaining high-frequency geometric details in significantly less time than conventional NeRF-based methods.
새로운 시점 합성은 최근 NeRF 기반 모델들에 의해 주도되어 왔지만, 이 모 델들은 주로 입력값이 주어진 장면을 밀도 높게 반영해야 하거나 값비싼 계산량 이 요구된다는 제약 조건을 가지고 있다. 주어진 이미지가 많지 않은 상황에서는 일부 부분에 대해 정보가 없거나 매우 부족하므로 전체 장면을 표현해 내는 데 어려움이 크다. 이러한 상황에서도 잘 동작하도록 설계된 기존 NeRF 모델들 조차 도 근본적으로 정보량 자체가 부족하여 모든 종류의 장면들을 완벽하게 재구성해 내지는 못하는 경향이 있다. 특히나 해당 모델들은 복잡한 패턴이나 입력 이미지 에서 보여지지 않는 영역이 많이 포함된 장면(unbounded scene)을 대상으로는 성능이 크게 떨어져, 그 결과로 흐릿하거나 떠다니는 물체들이 존재하는 부자연스 럽고 낮은 품질의 이미지를 생성해낸다. 본 논문에서 제시하는 방법은 희박하게 장면을 커버하는 소수의 입력값으로 인한 밀도 높은 정보의 부재와 NeRF 기반 모델들의 연산량 부담을 해소하기 위해 확산 모델의 빠르고 현실적인 생성 능력과 NeRF 기반 모델들에서 이용되는 자세 매개변수의 3차원 시점 일관성을 조합한 PoseDiff라는 모델을 제안한다. 구체적으로, PoseDiff는 희소한 입력값이 주어졌을 때 unbounded 장면 뿐만 아니라 bounded 장면과 정면을 향하는 장면들 모두에 대 해 새로운 시점을 합성할 수 있는 다중모달 포즈 조건부 확산 모델이다. PoseDiff는 기존 NeRF 기반 모델들 대비 현저하게 적은 시간 내에 포즈 매개변수가 주어졌을 때 고주파수 기하학적인 세부 사항들을 유지하는 렌더링을 할 수 있다.

Language: eng

URI: https://hdl.handle.net/10371/196720

https://dcollection.snu.ac.kr/common/orgView/000000178363

Files in This Item:

000000178363.pdf 26.56 MB

Appears in Collections:

Graduate School of Data Science (데이터사이언스 대학원)
- Theses (Master's Degree_데이터사이언스학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share