Publications

Detailed Information

NMF-based compositional models for audio source separation

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

권기수

Advisor
김남수
Major
공과대학 전기·컴퓨터공학부
Issue Date
2017-02
Publisher
서울대학교 대학원
Keywords
audio source separationnonnegative matrix factorization (NMF)online
Description
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2017. 2. 김남수.
Abstract
Many classes of data can be represented by constructive combinations of parts.
Most signal and data from nature have nonnegative values and can be explained and
reconstructed by constructive models. By the constructive models, only the additive
combination is allowed and it does not result in subtraction of parts. The compositional
models include dictionary learning, exemplar-based approaches, and nonnegative
matrix factorization (NMF). Compositional models are desirable in many areas
including image or visual signal processing, text information processing, audio signal
processing, and music information retrieval. In this dissertation, we choose NMF for
compositional models and NMF-based target source separation is performed for the
application.
The target source separation is the extraction or reconstruction of the target
signals in the mixture signals which consists with the target and interfering signals.
The target source separation can be thought as blind source separation (BSS). BSS
aims that the original unknown source signals are extracted without knowing or
with very limited information. However, in these days, much of prior information is
frequently utilized, and various approaches have been proposed for single channel
source separation.
NMF basically approximates a nonnegative data matrix V with a product of nonnegative basis and encoding matrices W and H, i.e., V WH. Since both W
and H are nonnegative, NMF often leads to a part based representation of the data.
The methods based on NMF have shown impressive results in single channel source
separation The objective function of NMF is generally presented Euclidean distant,
Kullback-Leibler divergence, and Itakura-saito divergence. Many optimization
methods have been proposed and utilized, e.g., multiplicative update rule, projected
gradient descent and NeNMF. However, NMF-based audio source separation has
some issues as follows: non-uniqueness of the bases, a high dependence to the prior
information, the overlapped subspace between target bases and interfering bases, a
disregard of the encoding vectors from the training phase, and insucient analysis
of sparse NMF. In this dissertation, we propose new approaches to resolve the above
issues.
In section 4, we propose a novel speech enhancement method that combines the
statistical model-based enhancement scheme with the NMF-based gain function.
For a better performance in time-varying noise environments, both the speech and
noise bases of NMF are adapted simultaneously with the help of the estimated
speech presence probability. In section 5, we propose a discriminative NMF (DNMF)
algorithm which exploits the reconstruction error for the interfering signals as well
as the target signal based on target bases. In section 6, we propose an approach to
robust bases estimation in which an incremental strategy is adopted. Based on an
analogy between clustering and NMF analysis, we incrementally estimate the NMF
bases similar to the modied k-means and Linde-Buzo-Gray algorithms popular
in the data clustering area. In Section 7, the distribution of the encoding vector
is modeled as a multivariate exponential PDF (MVE) with a single scaling factor
for each source. In Section 8, several sparse penalty terms for NMF are analyzed and compared in terms of signal to distortion ratio, sparseness of encoding vectors,
reconstruction error, and entropy of basis vectors. The new objective function which
applied sparse representation and discriminative NMF (DNMF) is also proposed.
Language
English
URI
https://hdl.handle.net/10371/119288
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share