영상 super-resolution을 위한 CNN 하드웨어의 on-chip 메모리 최적화

이동현

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

영상 super-resolution을 위한 CNN 하드웨어의 on-chip 메모리 최적화 : On-chip memory reduction in CNN hardware design for image super-resolution

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 이동현

Advisor: 이혁재

Major: 공과대학 전기·컴퓨터공학부

Issue Date: 2019-02

Publisher: 서울대학교 대학원

Description: 학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2019. 2. 이혁재.

Abstract: Single image super-resolution (SISR) 을 위한 convolutional neural network (CNN) 는 영상 분류용 CNN과 달리 고해상도의 영상을 입력 받아 고해상도의 중간 연산 결과인 feature map을 생성 한다. SISR용 CNN을 가속하기 위한 하드웨어는 주로 디스플레이 장치에 적용이 되며 외부 메모리 접근이 불가능한 스트리밍 구조를 갖는다. 이는 on-chip 메모리의 용량이 제한적인 하드웨어의 특성상 구현의 어려움을 야기한다. 기존의 연구들은 on-chip 메모리를 감소하기 위해 성능 저하 또는 압축 모듈을 추가한다. 본 논문은 성능 저하 없이 SISR용 CNN 하드웨어의 on-chip 메모리 감소 및 하드웨어를 설계하기 위한 방법을 제안한다.

CNN 하드웨어는 VDSR (Very deep neural network for super-resolution) 구조를 기반으로 한다. 기존 CNN 하드웨어의 SRAM에 읽기 및 쓰기 접근이 동시에 발생하는 래스터 스캔 순서를 부분적 수직 순서로 변경 함으로 읽기 및 쓰기 접근 타이밍을 분리한다. 부분적 수직 순서는 기존의 CNN 하드웨어가 사용하는 듀얼 포트 SRAM 대신 싱글 포트 SRAM을 사용하도록 하며 이는 on-chip 메모리를 절반으로 감소한다. 두 번째 방법으로 VDSR의 필터의 형태를 변경하는 방법을 적용한다. On-chip 메모리의 크기는 컨볼루션 필터의 높이에 비례한다. 그러나 VDSR의 필터는 대칭 구조 중 가장 작은 필터 모양이므로 해당 문제를 해결하기 위해 컨텍스트 보존 1D 필터 구성 방법 및 컨텍스트를 기반한 세로 필터 감소 방법을 적용하여 SRAM의 크기를 절반으로 추가적으로 감소한다.

CNN 하드웨어 구조가 확정 된 이후 CNN의 SISR 성능을 개선 하기 위한 CNN학습 방법을 자연 영상 (natural image)와 텍스트 영상 (text image)에 대해 각각 제안한다. SRGAN (Super-resolution generative adversarial networks) 는 판별자 네트워크 (discriminator network)로부터 발생하는 손실으로 SISR용 CNN이 실제 영상처럼 보이는 자연 영상을 출력하도록 한다. 그러나 SRGAN은 과선명화로 인한 시각적 결함을 발생하는 문제가 있다. 본 논문은 SRGAN의 시각적 결함을 제거하는 두 가지 방법을 제안한다. 첫 번째는 판별자 네트워크의 구조를 변경하여 판별자 네트워크 내에서 영상의 세부 정보 손실을 방지하는 해상도 유지 판별자 네트워크 구조를 제안 한다. 두 번째는 콘텐트 손실을 발생하는 VGG 네트워크의 구조상 영상의 세부적인 정보를 손실하는 문제를 해결하기 위한 해상도 유지 콘텐트 손실 방법을 제안한다.

텍스트 영상은 자연 영상이 아닌 합성 영상으로 영상 내 폰트와 배경의 색상 조합을 다양하게 변경될 수 있다. 기존의 CNN 학습 방법은 네트워크의 일반화를 위해 다양한 종류의 영상을 학습 시키는 방법을 사용한다. 그러나 모든 종류의 색상 조합을 CNN에 학습 시키는 것은 불가능하다. 본 논문은 영상 압축에 사용되는 De-colorization 방법을 차용하여 CNN이 학습할 영상을 검은 폰트와 흰색 배경으로 이루어진 영상으로 한정 함으로 학습되지 않은 영상의 폰트 및 배경 색상 조합에도 시각적 결함 없이 SISR 연산을 수행 하는 방법을 제안 한다.
Unlike convolutional neural network (CNN) for image classification, CNN for single image super-resolution (SISR) receives high-resolution image and generates feature maps which are high-resolution intermediate results. The hardware for accelerating the CNN for SISR is mainly applied to the display device, and the CNN hardware has a streaming architecture in which external memory access is impossible. This causes implementation difficulties due to the limited hardware capacity of the on-chip memory. This paper proposes two methods for designing CNN hardware for SISR using limited hardware resources.

CNN hardware is based on a very deep neural network for super-resolution (VDSR) architecture. By using the partially-vertical order for the convolution layers, simultaneous read and write accesses to SRAM are prevented. The proposed order makes CNN use single-port SRAM instead of dual-port SRAM, and it reduces on-chip memory area by half. The second method is to change the shape of the filter in VDSR. The size of the on-chip memory is proportional to the height of the convolution filter. However, since the filter of VDSR is the smallest of the symmetric shape, it is impossible to reduce the filter height of the VDSR. To solve this problem, a method of constructing a context-preserving 1D filter and a method of decreasing a vertical filter based on the context are proposed. These proposed methods reduce the size of the SRAM in half.

Two CNN training methods for SISR of natural image and that of text image are proposed. These methods improve SISR performance after the CNN hardware architecture is confirmed. SRGAN (super-resolution generative adversarial networks) is trained by the help of discriminator network to generate realistic natural images. However, SRGAN has the problem of causing visual defects due to over-sharpening. This paper proposes two methods to eliminate the visual defects of SRGAN. First, the resolution-preserving discriminator network structure is proposed. This discriminator network prevents detailed information loss in the network by changing the structure of it. Second, the resolution-preserving content loss is proposed to solve the problem of loss of detailed information of image due to the structure of VGG19 network that causes content loss.

The text image is not a natural image but a synthetic image. The color combination of the font and the background in the image can be variously changed. The existing CNN learning method uses a method of learning various kinds of images to generalize the network. However, it is impossible to learn all kinds of color combinations on CNN. This paper uses the de-colorization method used in image compression to limit the image to be learned by CNN to a black font and a white background image. As a result, CNN performs SISR operation without visual flaws in the font and background color combination image of the trained image.

Language: kor

URI: https://hdl.handle.net/10371/151952

Files in This Item:

000000154162.pdf 4.49 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share