Generalized Resampling Model for Practical Image Super-Resolution

Abstract: 최근 디스플레이, 카메라, 통신 기술의 발달로 인해 고해상도 영상에 대한 수요와 공급이 많이 늘어났다. 그러나 여전히 네트워크 대역폭이 제한되거나 고화질 원본이 없는 등의 다양한 현실적 상황에서 고품질 영상 콘텐츠를 즐기는 것은 쉽지 않다. 이러한 문제를 해결하기 위해, 영상 초 해상도 기법은 주어진 저해상도의 입력으로부터 고해상도의 영상을 복원하는 것을 목표로 하며, 심층 합성곱 신경망(CNN)을 기반으로 한 최근의 초 해상도 방법들은 낮은 품질의 영상에서 뛰어난 품질의 디테일과 텍스쳐를 성공적으로 복원할 수 있는 성능을 가지고 있다.

하지만, 이러한 알고리즘들이 항상 고품질의 결과물을 보장하는 것은 아니다. 이는 대부분의 초 해상도 방법들이 특정한 응용 상황을 전제하여, 이를 일반화된 현실 문제에 적용하는 것이 어렵기 때문이다. 예를 들어, 대부분의 고성능 단일 이미지 초 해상도 모델들은 저해상도로 합성된 이미지를 고정된 정수배의 배율만큼 늘려 원래의 고해상도 이미지를 복원하는 작업에만 적용될 수 있도록 설계되었다. 이러한 방법들은 합성된 이미지가 아닌 현실의 다양한 저해상도 이미지를 받아 임의의 배율만큼 확대하는 등의 더 실용적이고 일반적인 문제를 푸는 데에 적합하지 않다.

본 학위 논문에서는 기존 영상 초 해상도의 개념을 일반화된 리샘플링으로 확장하여 더욱 다양하고 일반적인 상황에서 실용적으로 사용할 수 있는 방법들을 제안한다. 첫째로, 2장에서는 기존 초 해상도 방법들이 인위적으로 합성된 데이터를 사용하여 학습하기 때문에 현실의 저해상도 이미지에 대한 일반화 성능이 떨어지는 문제를 다룬다. 이를 극복하기 위한 일부 선행 연구의 경우 현실의 저해상도-고해상도 학습 쌍을 획득하기 어렵다거나, 고전적인 쌍 삼차 보간법에 의존한다는 등의 단점들이 있어 범용적으로 적용하는 것이 쉽지 않았다. 따라서, 본 학위 논문에서는 비지도 학습을 통해 데이터 기반으로 다운샘플링 신경망의 목적 함수를 학습할 수 있는 ADL 알고리즘을 제안한다. 제안하는 방법은 정제된 학습 데이터 쌍이나 쌍 삼차 보간법 없이도 다양한 종류의 저해상도 이미지를 사실적으로 모방할 수 있도록 하며, 사전 정의되지 않은 다양한 현실적 상황에 효과적으로 적용할 수 있다. 최종적으로, 이렇게 생성한 이미지들을 활용하여 기존의 초 해상도 모델들을 실제와 더욱 가까운 학습 데이터로 최적화하고 보다 일반화된 상황에서 임의의 입력 이미지에 대해 뛰어난 초 해상도 성능을 얻는다.

3장에서는 다양한 모양의 출력을 만들 수 있는 초 해상도 알고리즘을 제안한다. 현실적인 초 해상도 모델은 사용자의 요구에 맞춰 임의의 배율로 영상을 확대할 수 있어야 하며, 이 개념을 더욱 확장해서 다양한 워핑 작업 또한 수행할 수 있는 것이 바람직하다. 그러나 기존의 방법들은 2배나 4배 등의 정수배 확대만을 수행할 수 있어 현실 문제에 활용하기 어렵다. 따라서, 본 학위 논문에서는 초 해상도 알고리즘을 일반적인 영상 리샘플링으로 확장하는 SRWarp를 제안한다. 이를 위해, 적응형 워핑 층과 다중 배율 혼합 기법을 사용해 이미지 기하 변화에 사용되는 공간적으로 변화하는 연산을 구현하고, 심층 리샘플링 모델 학습을 위한 DIV2KW 데이터 세트 또한 구성한다. 이렇게 구현된 SRWarp는 렌즈 왜곡 보정 등을 포함한, 기존 초 해상도 모델보다 더욱 일반화된 영상 워핑을 수행할 수 있다.

마지막으로, 4장에서는 ADL과 SRWarp를 결합하여 더욱 일반화된 영상 리샘플링 모델을 구현한다. 3장에서 다룬 SRWarp의 경우, 여전히 인위적으로 합성된 데이터를 사용하여 학습되기에 일반화 성능에 한계가 있다. 이에, ADL의 개념을 도입하여 이미지 리샘플링 알고리즘의 적용 범위를 현실의 응용문제까지 확장한다. 구체적으로, 본 학위 논문에서는 임의의 모양을 다룰 수 있는 초 해상도 모델을 별도의 학습 쌍 없이 다양한 합성 및 현실 데이터에 최적화할 수 있는 자기 지도 학습 프레임워크인 SelfWarp를 제안한다. SelfWarp는 자기 지도 및 멱등 손실 함수를 통해, 학습 데이터 쌍을 전혀 사용하지 않고 영상의 섬세한 디테일을 복원할 수 있다. 또한, 폭넓은 실험을 통해 SelfWarp에 적용된 기법들을 검증했으며, 해당 모델이 다양한 종류의 저해상도 영상을 받아 임의의 모양으로 만드는 일반화된 리샘플링 연산을 수행할 수 있는 것을 확인했다.

본 학위 논문에서는 영상 초 해상도 기법을 일반화된 리샘플링 문제로 재정의하고, 이를 해결하기 위한 다양한 방법론을 제안한다. 또한, 광범위한 실험과 정량적, 정성적 분석을 통해 제안하는 알고리즘들이 현실의 영상 리샘플링 문제를 효과적으로 해결할 수 있다는 것을 검증했다. 제안하는 방법을 통해 영상 품질 개선, 보안 감시, 관측 등 다양한 현실 응용 분야에 컴퓨터 비전의 고전적인 문제 중 하나인 영상 초 해상도 기법을 효과적으로 적용할 수 있을 것으로 기대된다.
With the rapid development of advanced display, camera, and communication technologies, supply and demand for high-resolution images and videos keep increasing. However, accessing high-quality content can be challenging or even unavailable in practical situations, such as limited network bandwidth, low-light conditions, or playing outdated videos. To overcome these limitations, single image super-resolution (SISR or SR) aims to reconstruct a high-resolution image from the given low-resolution input. Recent advancements in deep CNNs have enabled SR methods to retrieve high-quality details and textures from low-quality images surprisingly well.

Nevertheless, existing SR algorithms often fail to guarantee high-quality outputs in real-world scenarios. The primary limitation is that these methods are constructed under less-practical assumptions, making them unsuitable for more generalized situations. Specifically, most state-of-the-art SISR models are formulated to cover synthetic low-resolution images and fixed integer scaling factors. Therefore, they cannot perform well when handling more realistic scenarios, such as taking in-the-wild low-resolution images as inputs or dealing with arbitrary upsampling factors.

In this dissertation, we propose a practical solution to apply SR to real-world applications, particularly from the perspective of generalized image resampling. First, in Chapter 2, we address the issue that existing SR models are mainly designed to take bicubic-downsampled images rather than arbitrary low-resolution inputs from the real world. Such a limitation is derived from the difficulty of preparing and collecting realistic training samples for SR. While few learning-based methods aim to generate synthetic training pairs, they are still constrained to the less practical bicubic downsampling formulation. To this end, we propose a novel data-driven framework to construct an Adaptive Data Loss (ADL) for effective unsupervised learning. Rather than rely on bicubic downsampling formulations, our method can simulate latent downsampling models of synthetic and real-world images even without using paired training examples. Consequently, we implement a state-of-the-art SR model by utilizing low-resolution images generated from our novel downsampling network.

Next, in Chapter 3, we extend the concept of conventional SR to various output shapes. As a representative resampling algorithm, an ideal SR model is required to perform arbitrary-scale resizing and even image warping. Nevertheless, existing methods mainly focus on fixed integer scaling factors, e.g., X2 or X4, which limits their applicability to diverse real-world scenarios. To extend the scope of SR toward general resampling, we propose SRWarp, a learning-based approach for image warping. SRWarp incorporates an adaptive warping layer and multiscale blending to deal with the spatially-varying property of image transformation. We also introduce the DIV2KW dataset for training the image resampling model. Compared to traditional SR methods, SRWarp enables more generalized image resampling for practical applications, including lens distortion correction.

Finally, in Chapter 4, we integrate ADL and SRWarp to develop a generalized image warping algorithm. While SRWarp is still limited to synthetic training data, we leverage the concept of ADL to further extend its scope toward real-world applications. Specifically, we construct a fully self-supervised framework, SelfWarp, to fine-tune the arbitrary-shape SR model on diverse synthetic and real-world data. Based on novel self-supervised and idempotent loss terms, our model can effectively preserve image contents and reconstruct fine details without any paired training data. Extensive analysis justifies the concept of our SelfWarp, which can perform diverse warping operations on arbitrary types of LR images.

As one of the classical problems in computer vision, SR has a variety of applications, such as image quality enhancement, surveillance, and observation. While the conventional methods are limited to less practical scenarios, we propose more generalized formulations and methodologies to generalize the concept of SR toward real-world applications from the perspective of image resampling. Extensive studies demonstrate that the proposed solution implements a practical image resampling model, both quantitatively and qualitatively.

Language: eng

URI: https://hdl.handle.net/10371/196432

https://dcollection.snu.ac.kr/common/orgView/000000178282

Files in This Item:

000000178282.pdf 47.23 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share