Publications

Detailed Information

Style-Adaptive Speech Synthesis Utilizing Supplementary Information : 보완 정보를 이용한 스타일 적응 음성 합성 기법

DC Field Value Language
dc.contributor.advisor김남수-
dc.contributor.authorJune Sig Sung-
dc.date.accessioned2017-07-13T06:57:21Z-
dc.date.available2017-07-13T06:57:21Z-
dc.date.issued2013-02-
dc.identifier.other000000009677-
dc.identifier.urihttps://hdl.handle.net/10371/118900-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 김남수.-
dc.description.abstractParameter adaptation is necessitated to reduce the mismatch between the training and test conditions caused by speaker variability, channel difference and environmental effects. Particularly, speech data expressing various emotional states and singing voice with different musical notes will show quite different temporal and spectral characteristics compared with those of the reading-style speech. However, it is practically difficult to collect a lot of data from each speaking (or singing) style. For that reason, a small amount of data obtained from the target condition is usually utilized to adapt the parameters of speech recognition or speech synthesis systems.
This thesis brings in the performance enhancement of the conventional MLLR adaptation utilizing supplementary information. Since the conventional MLLR could not use the pitch and duration information for the singing voice and the level of emotional intensity for the expressive speech, it degrades the adaptation performance when the conventional MLLR is employed to the adaptation of the singing voice or expressive speech synthesis. Furthermore, the conventional MLLR does not guarantee the performance in case of non-linear mapping because it assumes to a linear mapping between an initial model set and the adaptation data.
In the first approach, we extend the conventional MLLR to the factored MLLR (FMLLR) framework where each MLLR parameter is defined as a function of the control parameter vector. And we further improve the performance of FMLLR by extending its structure from the one that is limited by diagonal assumption to generalized form. We also present a training method to estimate the general form of FMLLR parameters based on the expectation-maximization (EM) algorithm and performance evaluation reveals the effectiveness of the proposed technique.
To enhance the adaptation performance of the conventional MLLR, we propose a nonlinear generalization of MLLR for HMM-based speech synthesis system. The algorithm performs the nonlinear regression between a mean vector of base model and a corresponding mean vector of adaptation data with the use of kernel methods. Kernels are taken into account both exponential and polynomial forms, and the technique for learning their weights is also presented. One of the advantages of the proposed algorithm is that except for nonlinear mapping of the base model, the rest of operations are linear. Therefore the proposed algorithm can be obtained by solving a system of linear equations in the similar way as the conventional MLLR, which makes an easy implementation, and requires not so much computational loads.
-
dc.description.tableofcontentsAbstract i
Contents iii
List of Figures v
List of Tables x
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Organization of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 HMM-based speech synthesis system 7
2.1 Overview of HMM-based speech synthesis system . . . . . . . . . . . 7
2.2 Multi Space Probability Distribution . . . . . . . . . . . . . . . . . . 10
2.3 Decision tree based context clustering . . . . . . . . . . . . . . . . . 12
2.3.1 Decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Construction of decision tree . . . . . . . . . . . . . . . . . . 13
2.4 Parameter generation algorithm . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Maximizing P(O
-
dc.description.tableofcontentsQ, λ) with respect to O . . . . . . . . . . . 18
3 Previous researches 21
3.1 Excitation enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Waveform Interpolation . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Excitation Modeling based on PCA . . . . . . . . . . . . . . 27
3.2 Adaptation for speech synthesis . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 MLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Factored MLLR adaptation 33
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 MLLR adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 MRHSMM combined with MLLR . . . . . . . . . . . . . . . . . 40
4.4 factored MLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 Experiments on speech synthesis . . . . . . . . . . . . . . . . . 48
4.5.1 Experiments on Singing Voice Synthesis . . . . . . . . 48
4.5.2 Experiments on Expressive Speech Synthesis . . . . . 61
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7 appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Non-linear generalization of MLLR with kernel learning for HMMbased
speech synthesis 75
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 MLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 KernelMLLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.1 Objective performance evaluation . . . . . . . . . . . . . . . . 85
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6 Conclusion 93
A Statistical Approaches to ExcitationModeling in HMM-based Speech
Synthesis 95
A.1 Waveform Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 96
A.2 Time domain zero-paddedWI . . . . . . . . . . . . . . . . . . . . . . 98
A.3 Low Dimensional Representation of CWs . . . . . . . . . . . . . . . 101
A.3.1 Principal Component Analysis . . . . . . . . . . . . . . . . . 101
A.3.2 Non-negative Matrix Factorization . . . . . . . . . . . . . . . 102
A.4 Experiments for FDZ . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.5 Experiments for comparison between FDZ and TDZ . . . . . . . . . 107
A.5.1 Performance of low-dimensional representation . . . . . . . . 108
A.5.2 Evaluation of objective measure . . . . . . . . . . . . . . . . . 109
A.5.3 Subjective tests . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography 113
-
dc.formatapplication/pdf-
dc.format.extent3720399 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectHMM-based speech synthesis-
dc.subjectparameter adaptation-
dc.subjectMLLR-
dc.subjectfactored MLLR-
dc.subjectkernel-based MLLR-
dc.subjectsinging voice synthesis-
dc.subjectexpressive speech synthesis-
dc.subject.ddc621-
dc.titleStyle-Adaptive Speech Synthesis Utilizing Supplementary Information-
dc.title.alternative보완 정보를 이용한 스타일 적응 음성 합성 기법-
dc.typeThesis-
dc.contributor.AlternativeAuthor성준식-
dc.description.degreeDoctor-
dc.citation.pages118-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2013-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share