Publications

Detailed Information

NMF-based compositional models for audio source separation

DC Field Value Language
dc.contributor.advisor김남수-
dc.contributor.author권기수-
dc.date.accessioned2017-07-13T07:21:33Z-
dc.date.available2017-07-13T07:21:33Z-
dc.date.issued2017-02-
dc.identifier.other000000142205-
dc.identifier.urihttps://hdl.handle.net/10371/119288-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 김남수.-
dc.description.abstractMany classes of data can be represented by constructive combinations of parts.
Most signal and data from nature have nonnegative values and can be explained and
reconstructed by constructive models. By the constructive models, only the additive
combination is allowed and it does not result in subtraction of parts. The compositional
models include dictionary learning, exemplar-based approaches, and nonnegative
matrix factorization (NMF). Compositional models are desirable in many areas
including image or visual signal processing, text information processing, audio signal
processing, and music information retrieval. In this dissertation, we choose NMF for
compositional models and NMF-based target source separation is performed for the
application.
The target source separation is the extraction or reconstruction of the target
signals in the mixture signals which consists with the target and interfering signals.
The target source separation can be thought as blind source separation (BSS). BSS
aims that the original unknown source signals are extracted without knowing or
with very limited information. However, in these days, much of prior information is
frequently utilized, and various approaches have been proposed for single channel
source separation.
NMF basically approximates a nonnegative data matrix V with a product of nonnegative basis and encoding matrices W and H, i.e., V WH. Since both W
and H are nonnegative, NMF often leads to a part based representation of the data.
The methods based on NMF have shown impressive results in single channel source
separation The objective function of NMF is generally presented Euclidean distant,
Kullback-Leibler divergence, and Itakura-saito divergence. Many optimization
methods have been proposed and utilized, e.g., multiplicative update rule, projected
gradient descent and NeNMF. However, NMF-based audio source separation has
some issues as follows: non-uniqueness of the bases, a high dependence to the prior
information, the overlapped subspace between target bases and interfering bases, a
disregard of the encoding vectors from the training phase, and insucient analysis
of sparse NMF. In this dissertation, we propose new approaches to resolve the above
issues.
In section 4, we propose a novel speech enhancement method that combines the
statistical model-based enhancement scheme with the NMF-based gain function.
For a better performance in time-varying noise environments, both the speech and
noise bases of NMF are adapted simultaneously with the help of the estimated
speech presence probability. In section 5, we propose a discriminative NMF (DNMF)
algorithm which exploits the reconstruction error for the interfering signals as well
as the target signal based on target bases. In section 6, we propose an approach to
robust bases estimation in which an incremental strategy is adopted. Based on an
analogy between clustering and NMF analysis, we incrementally estimate the NMF
bases similar to the modied k-means and Linde-Buzo-Gray algorithms popular
in the data clustering area. In Section 7, the distribution of the encoding vector
is modeled as a multivariate exponential PDF (MVE) with a single scaling factor
for each source. In Section 8, several sparse penalty terms for NMF are analyzed and compared in terms of signal to distortion ratio, sparseness of encoding vectors,
reconstruction error, and entropy of basis vectors. The new objective function which
applied sparse representation and discriminative NMF (DNMF) is also proposed.
-
dc.description.tableofcontents1 Introduction 1
1.1 Audio source separation 1
1.2 Speech enhancement 3
1.3 Measurements 4
1.4 Outline of the dissertation 6

2 Compositional model and NMF 9
2.1 Compositional model 9
2.2 NMF 14
2.2.1 Update rules: MuR, PGD 16
2.2.2 Modied NMF 20

3 NMF-based audio source separation and issues 23
3.1 NMF-based audio source separation 23
3.2 Problems of NMF in audio source separation 26
3.2.1 A high dependency to the prior knowledge 26
3.2.2 A overlapped subspace between the target and interfering basis matrices 28
3.2.3 A non-uniqueness of the bases 29
3.2.4 A prior knowledge of the encoding vectors 30
3.2.5 Sparse NMF for the source separation 32

4 Online bases update 33
4.1 Introduction 33
4.2 NMF-based speech enhancement using spectral gain function 36
4.3 Speech enhancement combining statistical model-based and NMFbased methods with the on-line bases update 38
4.3.1 On-line update of speech and noise bases 40
4.3.2 Determining maximum update rates 42
4.4 Experiment result 43

5 Discriminative NMF 47
5.1 Introduction 47
5.2 Discriminative NMF utilizing cross reconstruction error 48
5.2.1 DNMF using the reconstruction error of the other source 49
5.2.2 DNMF using the interference factors 50
5.3 Experiment result 52


6 Incremental approach for bases estimate 57
6.1 Introduction 57
6.2 Incremental approach based on modied k-means clustering and Linde-Buzo-Gray algorithm 59
6.2.1 Based on modied k-means clustering 59
6.2.2 LBG based incremental approach 62
6.3 Experiment result 63
6.3.1 Modied k-means clustering based approach 63
6.3.2 LBG based approach 66

7 Prior model of encoding vectors 77
7.1 Introduction 77
7.2 Prior model of encoding vectors based on multivariate exponential distribution 78
7.3 Experiment result 82

8 Conclusions 87

Bibliography 91
국문초록 105
-
dc.formatapplication/pdf-
dc.format.extent2402684 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectaudio source separation-
dc.subjectnonnegative matrix factorization (NMF)-
dc.subjectonline-
dc.subject.ddc621-
dc.titleNMF-based compositional models for audio source separation-
dc.typeThesis-
dc.description.degreeDoctor-
dc.citation.pagesxii, 108-
dc.contributor.affiliation공과대학 전기·컴퓨터공학부-
dc.date.awarded2017-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share