Stereo Data Based and Blind Speech Feature Enhancement Techniques for Robust ASR

한창우

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Stereo Data Based and Blind Speech Feature Enhancement Techniques for Robust ASR : 강인한 음성인식을 위한 스테레오 데이터 기반 및 블라인드 음성 특징 향상 기법

DC Field	Value	Language
dc.contributor.advisor	김남수	-
dc.contributor.author	한창우	-
dc.date.accessioned	2017-07-13T06:55:10Z	-
dc.date.available	2017-07-13T06:55:10Z	-
dc.date.issued	2012-08	-
dc.identifier.other	000000004758	-
dc.identifier.uri	https://hdl.handle.net/10371/118867	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2012. 8. 김남수.	-
dc.description.abstract	배경잡음이 존재하는 경우 음성인식 시스템의 성능은 떨어진다. 배경 잡음이 없더라도 채널, 녹음 장비, 음향학적 반향 등에 의한 선형 또는 비선형 왜곡에 의해서도 성능이 저하될 수 있다. 본 논문에서는 강인한 음성인식을 위한 개선된 특징 향상 접근법에 대해 다루도록 한다. 채널 왜곡을 줄이기 위해 잘 알려진 접근법 중 하나로 특징 mapping 기법이 있는데, 이것은 왜곡된 음성의 특징을 깨끗한 음성에 가깝게 mapping 해주는 것이다. 특징 mapping 규칙은 보통 스테레오 데이터로부터 학습되는데, 스테레오 데이터는 reference와 target 조건에서 동시에 녹음한 데이터로 구성되어 있다. 본 논문에서는 switching linear dynamic system (SLDS)에 기반의 음성 특징 배열 mapping 알고리즘을 제안한다. 제안된 알고리즘은 기존의 벡터-벡터 mapping이 아닌 배열-배열 mapping을 가능하게 해준다. 또한 본 논문에서는 reference 특징 벡터가 없이도 동작하는 새로운 semi-blind 파라미터 추정 기법을 제안한다. 제안된 접근법은 hidden Markov model (HMM) 기반의 음성 합성 알고리즘에서 착안하여 개발되었다. 본 논문에서는 또한 특징이 깨끗한 음성으로 학습된 음성인식을 위한 음향 모델을 통과하기 전에 왜곡된 입력 특징을 보상해주는 특징 보상 기법에 대해서도 다루도록 한다. 제안된 특징 보상 알고리즘은 관련 파라미터 추정을 위해 학습 또는 적응 데이터가 필요 없는 blind 기법이다. 본 논문에서는 배경잡음과 음향학적 반향이 공존하는 상황에 맞게 설계된 새로운 interacting multiple model (IMM) 기반의 특징 보상 기법을 제안한다. 이 접근법에서는 로그 스펙트럼 도메인에서 시간에 따라 변하는 배경잡음이나 음향학적 반향과 같은 가산, convolution 형태의 왜곡에 대처하기 위해 switching linear dynamic model (SLDM)을 만든다. 깨끗한 음성과 음향학적 반향의 로그 주파수 응답을 함께 우리가 추정하기 원하는 state로 설정하여 음성이 왜곡되는 과정을 다중 state space 모델로 구성한다. 제안된 접근방식들은 반향이 존재하는 잡음 환경에서의 자동음성인식 성능의 영향을 파악하기 위해 표준으로 널리 쓰이는 Aurora-5 데이터베이스를 이용한 음성인식 실험에서 뛰어난 성능향상을 보인다.	-
dc.description.abstract	The performance of an automatic speech recognition (ASR) system deteriorates in the presence of background noise. Even without any background noise, the performance may be degraded because of the linear or non-linear distortions incurred by channel, recording devices or reverberations. In this thesis, we discuss advanced stereo data based and blind speech feature enhancement approaches for robust speech recognition. One of the well-known approaches to reduce the channel distortion is feature mapping which maps the distorted speech feature to its clean counterpart. The feature mapping rule is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this thesis, we propose a novel approach to speech feature sequence mapping based on the switching linear dynamic system (SLDS). The proposed algorithm enables us a sequence-to-sequence mapping in a systematic way, instead of the traditional vector-to-vector mapping. Furthermore, we propose a novel approach to semi-blind parameter estimation which does not require the reference feature vectors. The proposed approach is motivated by the hidden Markov model (HMM)-based speech synthesis algorithm. Additionally, we focus on the feature compensation technique, in which the distorted input features are compensated before being decoded using the acoustic recognition models that were trained on clean speech. The proposed feature compensation algorithms are blind techniques which mean that the training or adaptation data is not necessary for estimating the relevant parameters. In this thesis, we propose a novel blind approach for feature compensation based on the interacting multiple model (IMM) algorithm specially designed for joint processing of background noise and acoustic reverberation. This approach to cope with the time-varying environmental parameters is to establish a switching linear dynamic model (SLDM) for the additive and convolutive distortions, such as the background noise and acoustic reverberation, in the log-spectral domain. We construct multiple state space models with the speech corruption process in which the log-spectra of clean speech and log frequency response of acoustic reverberation are jointly handled as the state of our interest. The proposed approaches show significant improvements in the Aurora-5 speech recognition task which is developed to investigate the influence on the performance of ASR in reverberant noisy environments.	-
dc.description.tableofcontents	Abstract i Contents iii List of Figures vi List of Tables vii 1 Introduction 1 2 Experimental Environments and Baseline System 7 2.1 ASR in Hands-Free Scenario 7 2.2 Feature Extraction 9 2.3 Baseline ASR System for Aurora-5 11 3 Previous Feature Enhancement Approaches 15 3.1 Previous Stereo Data Based Feature Mapping Approach 16 3.1.1 Conventional SPLICE Algorithm 16 3.2 Previous Blind Feature Compensation Approach 17 3.2.1 Statistical Linear Approximation 17 3.2.2 Feature Compensation in a Bayesian Framework Based on Linear Approximation 18 3.2.3 Conventional IMM Algorithm 23 4 SLDS for Stereo Data Based Speech Feature Mapping 27 4.1 Introduction 27 4.2 Switching Linear Dynamic System 29 4.3 Enhanced Clustering Method 31 4.4 SLDS Parameter Estimation 33 4.5 Comparison With Other Approaches 35 4.5.1 Comparison Between SLDM And SLDS 35 4.5.2 SLDS Viewed as Filtering 37 4.5.3 Vector-to-Vector Mapping Techniques 40 4.6 Multi-frame Based SPLICE 41 4.7 Experimental Results 43 4.8 Summary 46 5 Semi-Blind Estimation of Feature Mapping Parameters 47 5.1 Introduction 47 5.2 Stereo-Based Feature Mapping 49 5.3 Arti¯cial Stereo Data Generation 50 5.3.1 Arti¯cial Reference Feature Generation From HMM 51 5.3.2 Combination With Feature Compensation Technique 53 5.4 Experiments 54 5.5 Summary 57 6 Blind Approach for Reverberation and Noise Robust Feature Compensation 59 6.1 Introduction 59 6.2 Relation Between Clean And Reverberant Noisy LMMSCs 61 6.3 Feature Compensation in a Bayesian Framework 63 6.3.1 A Priori Clean Speech Model 66 6.3.2 A Priori Model for RIR 67 6.3.3 A Priori Model for Background Noise 67 6.3.4 State Transition Formulation 68 6.3.5 Function Linearization 71 6.4 Feature Compensation Algorithm 72 6.4.1 Preprocessing 72 6.4.2 Predictive State Estimation 74 6.4.3 Iterative Linearization And Kalman Update 74 6.4.4 Postprocessing 77 6.4.5 Estimation of Clean Feature 77 6.5 Experiments With Feature Compensation Techniques 78 6.5.1 Experiments With Varying L 80 6.5.2 Experiments With Varying K 81 6.5.3 Experiments With Di®erent Methods of Clean Feature Estimation 83 6.5.4 Comparison With Conventional Techniques 83 6.6 Summary 85 7 Conclusions 91 Bibliography 93 Abstract (Korean) 98	-
dc.format	application/pdf	-
dc.format.extent	1052678 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	feature compensation	-
dc.subject	dereverberation	-
dc.subject	stereo data	-
dc.subject	switching linear dynamic system (SLDS)	-
dc.subject	interacting multiple model (IMM)	-
dc.title	Stereo Data Based and Blind Speech Feature Enhancement Techniques for Robust ASR	-
dc.title.alternative	강인한 음성인식을 위한 스테레오 데이터 기반 및 블라인드 음성 특징 향상 기법	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	Chang Woo Han	-
dc.description.degree	Doctor	-
dc.citation.pages	x, 100	-
dc.contributor.affiliation	공과대학 전기·컴퓨터공학부	-
dc.date.awarded	2012-08	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Files in This Item:

000000004758.pdf 1.00 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share