Style-Adaptive Speech Synthesis Utilizing Supplementary Information

June Sig Sung

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Style-Adaptive Speech Synthesis Utilizing Supplementary Information : 보완 정보를 이용한 스타일 적응 음성 합성 기법

DC Field	Value	Language
dc.contributor.advisor	김남수	-
dc.contributor.author	June Sig Sung	-
dc.date.accessioned	2017-07-13T06:57:21Z	-
dc.date.available	2017-07-13T06:57:21Z	-
dc.date.issued	2013-02	-
dc.identifier.other	000000009677	-
dc.identifier.uri	https://hdl.handle.net/10371/118900	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 김남수.	-
dc.description.abstract	Parameter adaptation is necessitated to reduce the mismatch between the training and test conditions caused by speaker variability, channel difference and environmental effects. Particularly, speech data expressing various emotional states and singing voice with different musical notes will show quite different temporal and spectral characteristics compared with those of the reading-style speech. However, it is practically difficult to collect a lot of data from each speaking (or singing) style. For that reason, a small amount of data obtained from the target condition is usually utilized to adapt the parameters of speech recognition or speech synthesis systems. This thesis brings in the performance enhancement of the conventional MLLR adaptation utilizing supplementary information. Since the conventional MLLR could not use the pitch and duration information for the singing voice and the level of emotional intensity for the expressive speech, it degrades the adaptation performance when the conventional MLLR is employed to the adaptation of the singing voice or expressive speech synthesis. Furthermore, the conventional MLLR does not guarantee the performance in case of non-linear mapping because it assumes to a linear mapping between an initial model set and the adaptation data. In the first approach, we extend the conventional MLLR to the factored MLLR (FMLLR) framework where each MLLR parameter is defined as a function of the control parameter vector. And we further improve the performance of FMLLR by extending its structure from the one that is limited by diagonal assumption to generalized form. We also present a training method to estimate the general form of FMLLR parameters based on the expectation-maximization (EM) algorithm and performance evaluation reveals the effectiveness of the proposed technique. To enhance the adaptation performance of the conventional MLLR, we propose a nonlinear generalization of MLLR for HMM-based speech synthesis system. The algorithm performs the nonlinear regression between a mean vector of base model and a corresponding mean vector of adaptation data with the use of kernel methods. Kernels are taken into account both exponential and polynomial forms, and the technique for learning their weights is also presented. One of the advantages of the proposed algorithm is that except for nonlinear mapping of the base model, the rest of operations are linear. Therefore the proposed algorithm can be obtained by solving a system of linear equations in the similar way as the conventional MLLR, which makes an easy implementation, and requires not so much computational loads.	-
dc.description.tableofcontents	Abstract i Contents iii List of Figures v List of Tables x 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Scope of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 HMM-based speech synthesis system 7 2.1 Overview of HMM-based speech synthesis system . . . . . . . . . . . 7 2.2 Multi Space Probability Distribution . . . . . . . . . . . . . . . . . . 10 2.3 Decision tree based context clustering . . . . . . . . . . . . . . . . . 12 2.3.1 Decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Construction of decision tree . . . . . . . . . . . . . . . . . . 13 2.4 Parameter generation algorithm . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Maximizing P(O	-
dc.description.tableofcontents	Q, λ) with respect to O . . . . . . . . . . . 18 3 Previous researches 21 3.1 Excitation enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.1 Waveform Interpolation . . . . . . . . . . . . . . . . . . . . . 24 3.1.2 Excitation Modeling based on PCA . . . . . . . . . . . . . . 27 3.2 Adaptation for speech synthesis . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 MLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Factored MLLR adaptation 33 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 MLLR adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 MRHSMM combined with MLLR . . . . . . . . . . . . . . . . . 40 4.4 factored MLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5 Experiments on speech synthesis . . . . . . . . . . . . . . . . . 48 4.5.1 Experiments on Singing Voice Synthesis . . . . . . . . 48 4.5.2 Experiments on Expressive Speech Synthesis . . . . . 61 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.7 appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5 Non-linear generalization of MLLR with kernel learning for HMMbased speech synthesis 75 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 MLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3 KernelMLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4.1 Objective performance evaluation . . . . . . . . . . . . . . . . 85 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6 Conclusion 93 A Statistical Approaches to ExcitationModeling in HMM-based Speech Synthesis 95 A.1 Waveform Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 96 A.2 Time domain zero-paddedWI . . . . . . . . . . . . . . . . . . . . . . 98 A.3 Low Dimensional Representation of CWs . . . . . . . . . . . . . . . 101 A.3.1 Principal Component Analysis . . . . . . . . . . . . . . . . . 101 A.3.2 Non-negative Matrix Factorization . . . . . . . . . . . . . . . 102 A.4 Experiments for FDZ . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.5 Experiments for comparison between FDZ and TDZ . . . . . . . . . 107 A.5.1 Performance of low-dimensional representation . . . . . . . . 108 A.5.2 Evaluation of objective measure . . . . . . . . . . . . . . . . . 109 A.5.3 Subjective tests . . . . . . . . . . . . . . . . . . . . . . . . . . 110 A.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Bibliography 113	-
dc.format	application/pdf	-
dc.format.extent	3720399 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	HMM-based speech synthesis	-
dc.subject	parameter adaptation	-
dc.subject	MLLR	-
dc.subject	factored MLLR	-
dc.subject	kernel-based MLLR	-
dc.subject	singing voice synthesis	-
dc.subject	expressive speech synthesis	-
dc.subject.ddc	621	-
dc.title	Style-Adaptive Speech Synthesis Utilizing Supplementary Information	-
dc.title.alternative	보완 정보를 이용한 스타일 적응 음성 합성 기법	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	성준식	-
dc.description.degree	Doctor	-
dc.citation.pages	118	-
dc.contributor.affiliation	공과대학 전기·컴퓨터공학부	-
dc.date.awarded	2013-02	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Files in This Item:

000000009677.pdf 3.55 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share