레지듀얼 심플 게이티드 콘보넷을 이용한 온-디바이스 음성인식

이윤진

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

레지듀얼 심플 게이티드 콘보넷을 이용한 온-디바이스 음성인식 : On-device Speech Recognition with Residual Simple Gated Convolutional Networks

DC Field	Value	Language
dc.contributor.advisor	성원용	-
dc.contributor.author	이윤진	-
dc.date.accessioned	2019-10-18T15:40:42Z	-
dc.date.available	2019-10-18T15:40:42Z	-
dc.date.issued	2019-08	-
dc.identifier.other	000000156245	-
dc.identifier.uri	https://hdl.handle.net/10371/161058	-
dc.identifier.uri	http://dcollection.snu.ac.kr/common/orgView/000000156245	ko_KR
dc.description	학위논문(석사)--서울대학교 대학원 :공과대학 전기·정보공학부,2019. 8. 성원용.	-
dc.description.abstract	생활의 편의를 위한 자동 음성인식 시스템은 늘 우리의 곁에 있는 스마트폰과 더불어 실생활에 쓰이는 임베디드 장치에 필수적으로 장착된 지 오래다. 특히, 인공신경망 기반의 알고리즘은 높은 성능 덕분에 자동 음성인식 시스템에 많이 적용된다. 하지만 대부분의 자동 음성인식은 각종 회사의 서버로 보내져 처리된다. 이는 사생활 침해, 보안의 문제와 높은 레이턴시를 야기하여 사용자의 장치에서 독립적으로 처리되는 자동 음성인식 시스템의 필요성이 높아지고 있다. 장치에서 직접 인공신경망 기반의 음성인식을 처리하기 위해서는 장치의 제한된 배터리 용량 때문에 전력 소비를 최소화해야 할 필요가 있다. 지금까지 많은 인공신경망 모델이 개발되었다. 이들 중, 연속적인 문맥을 볼 수 있도록 하는 회귀적 특성 덕분에 회귀신경망(RNN) 기반의 알고리즘이 음성인식을 위해 많이 사용된다. 특히 LSTM 기반의 회귀신경망(RNN)이 유명하다. 하지만, LSTM RNN에 사용되는 모든 파라미터를 저장하기에는 임베디드 장치가 가진 캐쉬 메모리의 크기가 충분하지 않다. 그래서 많은 DRAM 접근이 일어나는데, 이는 이 신경망이 계산되는 속도를 늦춘다. 또한, 현재 계산에 과거에 계산된 값이 되먹임(feedback)되는 복잡한 회귀신경망의 특성으로 인하여 매 시간 스텝에 대한 병렬화 계산이 불가하다. 이 또한 많은 DRAM 접근을 야기하여 LSTM RNN이 계산되는 속도증가를 저지하고, 많은 전력을 소모하게 한다. 이 논문에서는 한 번에 캐쉬 메모리에 저장될 수 있는 1M 정도의 작은 파라미터 크기를 가진 레지듀얼 심플 게이티드 콘보넷(Residual Simple Gated Convolutional Network)을 실제 임베디드 장치에서 실행한 결과를 보여준다. 이 모델이 가지는 1차원 depthwise 콘볼루션은 음성 신호의 일시적인 패턴을 찾는 데 도움이 된다. 또한, 회귀신경망 대신 콘볼루션 신경망을 사용함에 따라, 시간 스텝 별 병렬화가 가능해져, 배터리 사용량이 감소하고, 이 인공신경망 모델의 연산 속도가 높아진다. 임베디드 장치에서 음성인식 속도를 높이기 위하여 기존의 심플 게이티드 콘보넷에 인셉션 레지듀얼 연결을 추가하여 정확도 저하 없이 인공신경망의 층수를 줄이려 하였으나 이는 속도 향상에는 큰 성과가 없는 것으로 드러났다.	-
dc.description.abstract	Nowadays, many embedded devices, such as smartphones and Amazon Alexa, use automatic speech recognition (ASR) technology for the hands-free interface. Especially neural network-based algorithms are widely employed in ASR because of high accuracy and resiliency in noisy environments. Neural network-based algorithms require a large amount of computation for realtime operation. As a result, most of todays ASR systems adopt server-based processing. However, privacy concerns and low latency bring increased demand for on-device ASR. For on-device ASR, the power consumption should be minimized to increase the operating time. Many neural network models have been developed for high-performance ASR. Among them, the recurrent neural network (RNN) based algorithms are most commonly used for speech recognition. Especially long short-term memory (LSTM) RNN is very well known. However, executing the LSTM algorithm on an embedded device consumes much power because the cache size is too small to accommodate all the network parameters. Frequent DRAM accesses due to cache misses not only slow the execution but also incur a lot of power consumption. One possible solution to mitigate this problem is to compute multiple output samples at a time, which is called the multi-time step parallelization, to reduce the number of parameter fetches. However, the complex feedback structure of LSTM RNN does not allow multi-time step parallel processing. This thesis presents a Residual Simple Gated Convolutional Network (Residual Simple Gated ConvNet) model with only about 1M parameters. Nowadays, many CPUs can accommodate neural networks with a parameter size of 1M in cache memory. Thus, this model can run ASR very fast and efficiently without consuming much power. The developed model is also based on a convolutional neural network, thus the multi-time step processing can easily be applied. To achieve high accuracy with a small number of parameters, the model employs one-dimensional depthwise convolution, which helps to find temporal patterns of the speech signal. We also considered inception residual connections to reduce the needed number of layers, but this approach needs to be more improved. The developed Residual Simple Gated ConvNet showed very fairly high accuracy even with 1M parameters when trained on WSJ speech corpus. This model demands less than 10% of CPU time when running on ARM-based CPUs for embedded devices.	-
dc.description.tableofcontents	요 약 i 제 1 장 서론 5 1.1 온-디바이스 음성인식 : 이점과 문제점 . . . . . . . . . . . . . . . . 5 1.2 음성인식의 구성요소 . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 회귀신경망을 사용한 어쿠스틱 모델의 단점 . . . . . . . . . . . . . 7 1.4 관련 연구 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.1 diagonal LSTM RNN . . . . . . . . . . . . . . . . . . . . . . 8 1.4.2 QRNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.3 게이티드 콘보넷 . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 논문의 개요 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 제 2 장 심플 게이티드 콘보넷 12 2.1 게이티드 콘보넷 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 심플 게이티드 콘보넷 . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 레지듀얼 심플 게이티드 콘보넷 . . . . . . . . . . . . . . . . . . . . 15 2.4 실험결과 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 제 3 장 온-디바이스 심플 게이티드 콘보넷 22 3.1 낮은 레이턴시 심플 게이티드 콘보넷 . . . . . . . . . . . . . . . . . 22 3.1.1 음성 전처리 . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.2 신경망에서의 레이턴시 . . . . . . . . . . . . . . . . . . . . 22 3.2 심플 게이티드 컨보넷의 양자화 . . . . . . . . . . . . . . . . . . . . 23 3.2.1 신경망 파라미터 최소화 . . . . . . . . . . . . . . . . . . . . 23 3.2.2 직접적 양자화 방법 . . . . . . . . . . . . . . . . . . . . . . 24 3.2.3 재훈련을 통한 양자화 . . . . . . . . . . . . . . . . . . . . . 25 3.2.4 Tensorflow lite를 이용한 양자화 . . . . . . . . . . . . . . . . 26 3.3 실험결과 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 제 4 장 결론 31 ABSTRACT 37	-
dc.language.iso	kor	-
dc.publisher	서울대학교 대학원	-
dc.subject	음성인식	-
dc.subject	시퀀스 모델링	-
dc.subject	RNN	-
dc.subject	CNN	-
dc.subject	임베디드 디바이스	-
dc.subject.ddc	621.3	-
dc.title	레지듀얼 심플 게이티드 콘보넷을 이용한 온-디바이스 음성인식	-
dc.title.alternative	On-device Speech Recognition with Residual Simple Gated Convolutional Networks	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	YUHNJIN LEE	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	Master	-
dc.date.awarded	2019-08	-
dc.identifier.uci	I804:11032-000000156245	-
dc.identifier.holdings	000000000040▲000000000041▲000000156245▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Files in This Item:

000000156245.pdf 3.80 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share