Quantitative Attribute Manipulation by Navigating Latent Space with Quantifier

Abstract: This dissertation presents quantitative attribute manipulation of the object via navigating the latent space of generative adversarial networks (GAN) models. The three methods proposed in this dissertation aim to estimate the attribute quantities, manipulate attributes quantitatively without requiring manipulation scale tuning, and manipulate attributes of 3D objects while preserving multi-view consistency.
The first method proposes a novel quantifier that estimates attribute quantity using the feature representation of vision-language models. This quantifier is advantageous as it can be trained with a small number of data samples, and it estimates the numerical quantity of an attribute in a normalized range of [0, 1], regardless of the specific attribute being measured. The method is demonstrated to be effective in experiments with multiple attributes across various categories and can be used generically for various attributes.
The second method proposes a user-friendly image manipulation scheme that assigns only a target attribute quantity normalized in the range of [0, 1] for the target image, regardless of the source images. The proposed method is based on the latent space navigator, which utilizes the attribute quantifier. The estimated source attribute quantity by the quantifier, the target attribute quantity, and the source image are used as input to the navigator to manipulate the target feature for generating the target image by StyleGAN. A training method for the navigator is also proposed using the quantifier. Evaluation of the image manipulation performance on various benchmarks confirms that the proposed scheme outperforms competing methods for quantitative image manipulation, both qualitatively and quantitatively.
The third method proposes a method to achieve view consistency during 3D image manipulation in the latent space, crucial for implementing the manipulation on the real objects. For both the quantifier and navigator, view consistency is considered in the design and training process to alleviate the inconsistency problem observed in estimating attributes across viewpoints. The proposed method is evaluated with various attributes of 3D objects, including human faces, and is shown to be effective.
Overall, the proposed methods enable the estimation and manipulation of various attributes, including those that are challenging to quantify using text alone. By leveraging generative models and the feature representation of vision-language models, these methods provide new possibilities for image manipulation in virtual environments with potential applications in various fields. The proposed image manipulation scheme is user-friendly, while the method for achieving view consistency during 3D image manipulation also enables the quantitative manipulation of custom attributes.
본 학위논문에서는 생성적 적대 신경망 모델의 잠재 공간을 탐색함으로써 객체의 속성을 정량적으로 조작하는 방법을 제시한다.본 학위논문에서 제안하는 세 가지 방법은 속성의 양을 추정하고, 조작 크기에 대한 조정 없이 속성을 정량적으로 조작하며, 다중 시점 일관성을 유지하면서 3차원 객체의 속성을 정밀하게 조작하는 것을 목표로 한다.
첫 번째 방법에서는 시각 언어 모델의 특징 표현을 사용하여 속성의 양을 추정하는 새로운 정량기를 제안한다. 이 정량기는 적은 수의 데이터 샘플로 훈련할 수 있고, 속성 종류에 관계없이 0에서 1 사이의 정규화된 범위에서 속성량을 추정할 수 있다는 장점이 있다. 이 방법은 다양한 범주의 여러 속성을 대상으로 한 실험에서 효과적인 것으로 입증되었으며, 다양한 속성에 대해 범용적으로 사용 가능하다.
두 번째 방법은 원본 이미지와 관계없이 대상 이미지에 대해 0에서 1 범위 내의 정규화된 목표 속성량만 할당하는 사용자 친화적인 이미지 조작 체계를 제안한다. 제안하는 방법은 속성 정량기를 활용하는 잠재 공간 탐색기를 기반으로 한다. 정량기에 의해 추정된 원본 속성량과 목표 속성량, 원본 이미지를 내비게이터에 입력으로 사용하여 StyleGAN이 목표 이미지를 생성할 수 있도록 목표 특징을 조작한다. 또한 정량기를 사용하여 내비게이터에 대한 훈련 방법도 제안한다. 다양한 벤치마크에서 이미지 조작 성능을 평가한 결과, 제안한 방식이 정량적 이미지 조작을 위한 경쟁 방법보다 질적, 양적으로 우수한 성능을 보임을 확인했다.
세 번째 방법은 실제 객체에 대한 조작을 구현하는 데 중요한 잠재 공간에서 3D 이미지 조작 중 뷰 일관성을 달성하는 방법을 제안한다. 정량기와 내비게이터 모두 설계 및 훈련 과정에서 뷰 일관성을 고려하여 여러 시점에 걸쳐 속성을 추정할 때 관찰되는 불일치 문제를 완화한다. 제안된 방법은 사람의 얼굴을 포함한 다양한 3D 객체의 속성으로 평가되었으며 효과적인 것으로 나타났다.
종합적으로 본 학위논문에서 제안하는 방법은 텍스트만으로는 정량화하기 어려운 속성을 포함한 다양한 속성의 추정과 조작을 가능하게 한다. 생성 모델과 시각 언어 모델의 특징 표현을 활용함으로써 가상 환경에서 이미지 조작을 위한 새로운 가능성을 제공하며 다양한 분야에 응용할 수 있다. 제안된 이미지 조작 방식은 사용자 친화적이며, 시점 일관적인 3차원 이미지 조작 방법을 통해 사용자 지정 속성의 정량적 조작도 가능하다.

Language: eng

URI: https://hdl.handle.net/10371/196445

https://dcollection.snu.ac.kr/common/orgView/000000177289

Files in This Item:

000000177289.pdf 70.69 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share