Style-Adaptive Speech Synthesis Utilizing Supplementary Information : 보완 정보를 이용한 스타일 적응 음성 합성 기법

Cited 0 time in Web of Science Cited 0 time in Scopus

June Sig Sung

공과대학 전기·컴퓨터공학부
Issue Date
서울대학교 대학원
HMM-based speech synthesisparameter adaptationMLLRfactored MLLRkernel-based MLLRsinging voice synthesisexpressive speech synthesis
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 김남수.
Parameter adaptation is necessitated to reduce the mismatch between the training and test conditions caused by speaker variability, channel difference and environmental effects. Particularly, speech data expressing various emotional states and singing voice with different musical notes will show quite different temporal and spectral characteristics compared with those of the reading-style speech. However, it is practically difficult to collect a lot of data from each speaking (or singing) style. For that reason, a small amount of data obtained from the target condition is usually utilized to adapt the parameters of speech recognition or speech synthesis systems.
This thesis brings in the performance enhancement of the conventional MLLR adaptation utilizing supplementary information. Since the conventional MLLR could not use the pitch and duration information for the singing voice and the level of emotional intensity for the expressive speech, it degrades the adaptation performance when the conventional MLLR is employed to the adaptation of the singing voice or expressive speech synthesis. Furthermore, the conventional MLLR does not guarantee the performance in case of non-linear mapping because it assumes to a linear mapping between an initial model set and the adaptation data.
In the first approach, we extend the conventional MLLR to the factored MLLR (FMLLR) framework where each MLLR parameter is defined as a function of the control parameter vector. And we further improve the performance of FMLLR by extending its structure from the one that is limited by diagonal assumption to generalized form. We also present a training method to estimate the general form of FMLLR parameters based on the expectation-maximization (EM) algorithm and performance evaluation reveals the effectiveness of the proposed technique.
To enhance the adaptation performance of the conventional MLLR, we propose a nonlinear generalization of MLLR for HMM-based speech synthesis system. The algorithm performs the nonlinear regression between a mean vector of base model and a corresponding mean vector of adaptation data with the use of kernel methods. Kernels are taken into account both exponential and polynomial forms, and the technique for learning their weights is also presented. One of the advantages of the proposed algorithm is that except for nonlinear mapping of the base model, the rest of operations are linear. Therefore the proposed algorithm can be obtained by solving a system of linear equations in the similar way as the conventional MLLR, which makes an easy implementation, and requires not so much computational loads.
Files in This Item:
Appears in Collections:
College of Engineering/Engineering Practice School (공과대학/대학원)Dept. of Electrical and Computer Engineering (전기·정보공학부)Theses (Ph.D. / Sc.D._전기·정보공학부)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.