Stereo Data Based and Blind Speech Feature Enhancement Techniques for Robust ASR

한창우

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Stereo Data Based and Blind Speech Feature Enhancement Techniques for Robust ASR : 강인한 음성인식을 위한 스테레오 데이터 기반 및 블라인드 음성 특징 향상 기법

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 한창우

Advisor: 김남수

Major: 공과대학 전기·컴퓨터공학부

Issue Date: 2012-08

Publisher: 서울대학교 대학원

Keywords: feature compensation ; dereverberation ; stereo data ; switching linear dynamic system (SLDS) ; interacting multiple model (IMM)

Description: 학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2012. 8. 김남수.

Abstract: 배경잡음이 존재하는 경우 음성인식 시스템의 성능은 떨어진다. 배경 잡음이 없더라도 채널, 녹음 장비, 음향학적 반향 등에 의한 선형 또는 비선형 왜곡에 의해서도 성능이 저하될 수 있다.
본 논문에서는 강인한 음성인식을 위한 개선된 특징 향상 접근법에 대해 다루도록 한다. 채널 왜곡을 줄이기 위해 잘 알려진 접근법 중 하나로 특징 mapping 기법이 있는데, 이것은 왜곡된 음성의 특징을 깨끗한 음성에 가깝게 mapping 해주는 것이다. 특징 mapping 규칙은 보통 스테레오 데이터로부터 학습되는데, 스테레오 데이터는 reference와 target 조건에서 동시에 녹음한 데이터로 구성되어 있다. 본 논문에서는 switching linear dynamic system (SLDS)에 기반의 음성 특징 배열 mapping 알고리즘을 제안한다. 제안된 알고리즘은 기존의 벡터-벡터 mapping이 아닌 배열-배열 mapping을 가능하게 해준다. 또한 본 논문에서는 reference 특징 벡터가 없이도 동작하는 새로운 semi-blind 파라미터 추정 기법을 제안한다. 제안된 접근법은 hidden Markov model (HMM) 기반의 음성 합성 알고리즘에서 착안하여 개발되었다.
본 논문에서는 또한 특징이 깨끗한 음성으로 학습된 음성인식을 위한 음향 모델을 통과하기 전에 왜곡된 입력 특징을 보상해주는 특징 보상 기법에 대해서도 다루도록 한다. 제안된 특징 보상 알고리즘은 관련 파라미터 추정을 위해 학습 또는 적응 데이터가 필요 없는 blind 기법이다. 본 논문에서는 배경잡음과 음향학적 반향이 공존하는 상황에 맞게 설계된 새로운 interacting multiple model (IMM) 기반의 특징 보상 기법을 제안한다. 이 접근법에서는 로그 스펙트럼 도메인에서 시간에 따라 변하는 배경잡음이나 음향학적 반향과 같은 가산, convolution 형태의 왜곡에 대처하기 위해 switching linear dynamic model (SLDM)을 만든다. 깨끗한 음성과 음향학적 반향의 로그 주파수 응답을 함께 우리가 추정하기 원하는 state로 설정하여 음성이 왜곡되는 과정을 다중 state space 모델로 구성한다.
제안된 접근방식들은 반향이 존재하는 잡음 환경에서의 자동음성인식 성능의 영향을 파악하기 위해 표준으로 널리 쓰이는 Aurora-5 데이터베이스를 이용한 음성인식 실험에서 뛰어난 성능향상을 보인다.
The performance of an automatic speech recognition (ASR) system deteriorates in the presence of background noise. Even without any background noise, the performance may be degraded because of the linear or non-linear distortions incurred by channel, recording devices or reverberations.
In this thesis, we discuss advanced stereo data based and blind speech feature enhancement approaches for robust speech recognition. One of the well-known approaches to reduce the channel distortion is feature mapping which maps the distorted speech feature to its clean counterpart. The feature mapping rule is usually trained based on a set of stereo data which consists of the simultaneous recordings obtained in both the reference and target conditions. In this thesis, we propose a novel approach to speech feature sequence mapping based on the switching linear dynamic system (SLDS). The proposed algorithm enables us a sequence-to-sequence mapping in a systematic way, instead of the traditional vector-to-vector mapping. Furthermore, we propose a novel approach to semi-blind parameter estimation which does not require the reference feature vectors. The proposed approach is motivated by the hidden Markov model (HMM)-based speech synthesis algorithm.
Additionally, we focus on the feature compensation technique, in which the distorted input features are compensated before being decoded using the acoustic recognition models that were trained on clean speech. The proposed feature compensation algorithms are blind techniques which mean that the training or adaptation data is not necessary for estimating the relevant parameters. In this thesis, we propose a novel blind approach for feature compensation based on the interacting multiple model (IMM) algorithm specially designed for joint processing of background noise and acoustic reverberation. This approach to cope with the time-varying environmental parameters is to establish a switching linear dynamic model (SLDM) for the additive and convolutive distortions, such as the background noise and acoustic reverberation, in the log-spectral domain. We construct multiple state space models with the speech corruption process in which the log-spectra of clean speech and log frequency response of acoustic reverberation are jointly handled as the state of our interest.
The proposed approaches show significant improvements in the Aurora-5 speech recognition task which is developed to investigate the influence on the performance of ASR in reverberant noisy environments.

Language: English

URI: https://hdl.handle.net/10371/118867

Files in This Item:

000000004758.pdf 1.00 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share