Neural Network Architecture for Image Restoration of Display-Intrinsic Blurs

고재현

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Neural Network Architecture for Image Restoration of Display-Intrinsic Blurs : 디스플레이 특성 유래 블러 영상 복원을 위한 신경망 구조

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 고재현

Advisor: 윤성로

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: Neural Network ; Image Restoration ; Motion Deblurring ; Under-display Camera

Description: 학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2023. 2. 윤성로.

Abstract: 디스플레이 장치에는 표시 품질의 저하를 유발하는 두 가지 종류의 블러가 있으며 이는 디스플레이의 구조적 특성으로 발생한다. 하나는 홀드 타입 발광 특성에 기인하는 인지적 모션 블러이며, 다른 하나는 전면 스크린 스마트기기에서 디스플레이 하측의 카메라 (언더 디스플레이 카메라)로 촬상된 영상의 색상 왜곡을 동반한 블러이다. 과거의 열화 모델 기반의 최적화 알고리즘을 사용하는 영상 복원 기술은 느린 연산 속도와 복원 성능의 저하로 인하여 상품화에 적용하기 어려웠다. 최근 뉴럴 네트워크를 이용한 딥러닝 알고리즘은 영상 복원, 처리 분야에 기존 방식을 대체하는 새로운 선택지로 자리매김하였으며 그 성능과 연산 속도는 기존 방법들을 압도하고 있다. 이러한 이유로 딥러닝 기반의 영상 복원 알고리즘은 디스플레이 화질 개선 분야에 큰 관심을 받고 있다.

앞에서 언급한 두 종류의 블러는 그 원인과 형태가 달라 각 테스크에 적합한 뉴럴 네트워크 아키텍쳐의 설계가 필요하다. 그리고 각 아키텍쳐를 구성하는 단위 네트워크 블럭은 학습 파라미터의 탐색 공간를 제약하지 않는, 즉 인덕티브 바이어스를 최소화 하는 설계가 요구된다. 본 학위 논문은 서로 다른 디블러 테스크에 특화된 새로운 네트워크 아키텍쳐에 대한 고찰과 이 매크로 아키텍쳐 구성하는 효율적이고 고성능의 단위 네트워크 블럭의 설계 방법론과 그 연구 결과물을 포함하고 있다.

첫 번째 연구는 비디오에서 물체의 모션에 의하여 발생하는 `인지적 블러'를 보상하는 것이다. OLED, LCD, 그리고 micro-LED와 같은 능동형 매트릭스 디스플레이는 `sample-and-hold' 발광 특성으로 인해 인지적 모션 블러가 발생한다. 프레임 더블링과 같은 하드웨어적 접근은 비용이 많이 들고 기존의 소프트웨어적 보상 방식은 추론 속도가 매우 느리다. 우리는 비용적으로 효율적인 소프트웨어 접근이면서 추론 속도가 빠른 딥러닝 기반의 보상 방법을 제안한다. 여기에는 세 가지 중요한 컴포넌트를 포함한다. 첫 번째는 영상의 움직임을 파악하는 옵티컬 플로우 예측 네트워크, 얻어진 모션 벡터을 바탕으로 사람이 어떻게 블러를 인지하는지 예측하는 시뮬레이션 알고리즘 그리고 이 두 가지 정보를 기반으로 보상된 영상을 복원하는 디컨볼루션 네트워크이다. 제안된 방법은 기존 최적화 방법보다 추론 속도가 매우 빠르며, 복원 성능 측면에서 기존 방법과 비교하여 크게 개선되었다.

두 번째 연구는 언더 디스플레이 카메라로 촬영된 영상의 열화를 복원하는 네트워크 아키텍쳐와 관련되어 있다. 언더 디스플레이 카메라로 촬영한 영상은 카메라 앞에 존재하는 디스플레이로 인하여 화질이 크게 열화된다. 우리는 이 열화를 1) 빛이 픽셀 그리드 패턴을 통과할 때 발생하는 회절에 의한 고주파 성분의 감쇄와 2) OLED의 다중 박막을 빛이 통과할 때 발생하는 휘도 저하 및 색상 변화, 두 가지 프로세스로 모델링 한다. 우리는 두 종류의 서로 다른 열화 메커니즘을 동시에 복원 할 수 있는 두 개의 분기를 갖는 뉴럴 네트워크 아키텍쳐를 제안한다. 여기에는 우리가 제안하는 새로운 열화 모델에 특화된 두 가지 아핀 변환 연결을 포함한다. 첫 번째 1차원 아핀 변환 연결은 솔루션 공간을 입력 영상의 선형 변환 공간으로 제한함으로써 다층의 합성곱 연산으로 인한 복원 영상의 블러를 방지할 수 있고, 두 번째 3차원 아핀 변환 연결은 네트워크가 색상 복원을 포함하는 저주파 성분의 복원에 집중하도록 하는 제약을 네트워크에 주입한다. 제안된 아키텍쳐는 각 분기가 영상의 구조 복원, 색 복원을 담당하도록 하는 암묵적 인덕티브 바이어스를 갖게 됨으로써 별도의 손실 함수 또는 명시적 제약 방법 없이 작업을 분리하여 수행할 수 있다. 우리는 세 종류의 공개된 언더 디스플레이 카메라 데이터 셋으로 제안하는 네트워크를 학습시켜 최고의 복원 성능을 달성하였다.

마지막 연구는 앞에서 다룬 두 가지 디블러링 모델의 매크로 아키텍쳐를 구성하는 단위 네트워크 블럭에 관한 고찰이다. 컴퓨터 비전 분야에서 사용하는 네트워크 블럭은 합성곱 네트워크를 사용하는 첫 번째 세대에서, 상대적으로 적은 인덕티브 바이어스를 가지고 있는 트랜스포머와 MLP 믹서 기반의 네트워크 블럭을 사용하는 두 번째 세대로 넘어가고 있다. 그리고 중요한 정보만을 다음 레이어로 전달하여 효율성을 극대화하기 위한 게이팅 메커니즘도 활발히 연구되고 있다. 우리는 영상 복원을 위하여 MLP 믹서 기반의 새로운 게이팅 메커니즘을 개발한다. 제안하는 게이팅 방법에는 인트라-토큰 게이팅과 크로스-토큰 게이팅 이 두개의 게이팅 유닛을 포함한다. 인트라-토큰 게이팅은 토큰안의 값들을 비교하여 중요한 정보를 선택적으로 전달하는 역할을 하며, 크로스-토큰 게이팅은 주변 토큰과의 정보 교환을 통하여 넓은 상황 이해를 기반으로 토큰을 업데이트한다. 더하여 크로스-토큰 게이팅 유닛에 포함된 `재활용 연결'은 좁은 영역의 수용 필드로 수행된 인트라-토큰 게이팅에서 버려진 정보를 넓은 영역의 토큰들 간의 상관성을 고려하여 잃어버린 정보를 다시 사용할 수 있도록 다시 가져온다. 기존 게이팅 방법이 2차 상호작용에 의존하는 반면 제안된 게이팅 방법은 트렌스포머와 동등하게 3차 상호작용의 효과를 얻을 수 있다, 그러면서도 트렌스포머의 셀프 어텐션의 2차 복잡도 계산을 필요로 하지 않아 상대적으로 효율적이다. 우리는 제안한 네트워크 블럭을 사용하여 다양한 블러 복원 실험을 통하여 기존 복원 모델에 사용된 블럭들 대비 효과적임을 검증하였다.

본 학위 논문을 통해 세 가지 연구 결과를 제시하였으며 디스플레이 표시 품질 개선을 위한 두 가지 디블러 테스크에 적극 활용 될 수 있음을 검증하였다. 우리는 각 테스크의 해법의 본질을 심도 있게 분석 하였으며 이 과정을 통하여 테스크 별로 네트워크에 어떤 인덕티브 바이어스가 필요한지에 대하여 고찰하였다. 본 연구를 바탕으로 각 태스크의 잔여 이슈를 고찰해 볼 수 있을 것이며 디스플레이 화질 개선 연구자는 물론 다양한 영상 복원 문제를 해결하는 연구자들로 하여금 중요한 영감을 줄 수 있을 것으로 기대한다.
Active matrix displays (AMDs), including liquid crystal displays (LCDs), organic light-emitting displays (OLEDs), and micro-LED displays (MLDs), have serious issues with image quality due to two different forms of blurring that are brought on by the inherent characteristics of the display devices: (1) a perceptual blur due to a sample-and-hold emitting property of AMDs, and (2) a blur with a color corruption in the captured images by under-display cameras (UDCs). Model-based optimization methods for image restoration have been extensively studied to address these blur problems. But, the slow inference speed and low performance of this methodology have become critical obstacles in adopting it to commercial displays. Recently, deep learning algorithms have been replacing the existing optimization methods in image restoration and their performance and computational speed considerably outperform conventional methods. To adopt the deep learning method to display deblurring problems, each network should be designed with a distinct architecture depending on the task because the cause and mechanism of the two blurs in displays are different. On the other hand, a network block in the macro architecture should have a less inductive bias so that it can freely explore the search space. This dissertation contains methods and substantial results on three important research topics in image deblurring for displays, including macro architectures specific to two different deblurring tasks and an efficient network block.

The first research is compensation for the perceptual blur in AMDs. We propose a display motion deblurring network that compensates for motion blur using a neural network trained on pairs of images with a synthetic random displacement between them. The proposed network includes three critical components: a) an optical flow network that determines the motion vector of the moving object, b) an algorithm of perceptual blur to simulate what is seen by a viewer based on the obtained motion vector, and c) a deconvolution network that estimates the compensated images by assessing them by convolving with a kernel produced by a perceptual blur estimation algorithm. This technique is approximately 87 times faster than the optimization method, which produces an equivalent level of compensation, and restoration performance is greatly enhanced compared to the existing technique.

In the second research, we propose a network architecture that restores blurred images from UDCs. The images captured by UDCs are degraded by the screen in front of them. We model this degradation in terms of a) diffraction by the pixel grid, which attenuates high-spatial-frequency components of the image; and b) intensity reduction and color change caused by the multiple thin-film layers in an OLED, which modulate the low-spatial-frequency components of the image. We introduce a deep neural network with two branches to reverse each type of degradation, which is more effective than performing both restorations in a single forward network. We also propose an affine transform connection to replace the skip connection used in most existing methods for restoring UDC images. Confining the solution space to the linear transform domain reduces the blurring caused by convolution. The proposed architecture exhibits an inductive bias enabling each branch to perform distinct tasks without any loss function and regularization. Trained on three datasets of UDC images, our network outperformed existing methods in terms of measures of distortion and of perceived image quality.

The final research is the development of an efficient network block for image deblurring. Neural network architectures based on MLP mixer exhibit less inductive bias and are increasingly being used in various image processing and low-level vision tasks. In this research, a novel gating mechanism is applied to the MLP mixer-based architecture for image restoration. In the proposed architecture, embedded tokens are subjected to channel and token mixing, which is the primary data flow of the existing MLP mixer. The token vectors are subsequently refined through the proposed intra-token and cross-token gating. Intra-token gating determines the information that is to be propagated or discarded by the interaction of information within each token. By contrast, cross-token gating calculates the propagation weights of local information and recycles information discarded from intra-token gating by comparing the information with adjacent tokens. The two gating paths result in third-order interaction because of cascaded gating multiplication, which is similar to the self-attention of Transformer. However, the proposed method is more efficient than Transformer because it does not involve the quadratic cost of self-attention. Moreover, the mechanism enables the model to learn the complex multi-modal Gaussian mixture distribution of clean images by combining the two distributions from the different gating paths. The proposed network was applied to various spatially variant deblurring tasks; it outperformed baselines in terms of restoration performance and computational cost.

This dissertation presents the three methods of restoring the image blurred by the display's optical or structural limitations. We provide substantial experimental results, demonstrating the effectiveness of the task-specific inductive bias injected in the macro architectures, and the network block, which employs a multi-gating mechanism. We expect that the proposed methods will provide an important inspiration to researchers not only in display fields but also in image processing.

Language: eng

URI: https://hdl.handle.net/10371/203989

https://dcollection.snu.ac.kr/common/orgView/000000174586

Files in This Item:

000000174586.pdf 384.00 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share