Publications

Detailed Information

End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform

Cited 2 time in Web of Science Cited 4 time in Scopus
Authors

Lee, Hyeonseung; Kim, Hyung Yong; Kang, Woo Hyun; Kim, Jeunghun; Kim, Nam Soo

Issue Date
2019-09
Publisher
ISCA-INT SPEECH COMMUNICATION ASSOC
Citation
INTERSPEECH 2019, pp.4285-4289
Abstract
This paper describes a novel waveform-level end-to-end model for multi-channel speech enhancement. The model first extracts sample-level speech embedding using channel-wise convolutional neural network (CNN) and compensates time-delays between the channels based on the embedding, resulting in time-aligned multi-channel signals. Then the signals are given as input of multi-channel enhancement extension of WaveUNet which directly outputs estimated clean speech waveform. The whole model is trained to minimize modified mean squared error (MSE), signal-to-distortion ratio (SDR) cost, and senone cross-entropy of back-end acoustic model at the same time. Evaluated on the CHiME-4 simulated set, the proposed system outperformed state-of-the-art generalized eigenvalue (GEV) beamformer in terms of perceptual evaluation of speech quality (PESQ) and SDR, and showed competitive results in short time objective intelligibility (STOI). Word-error-rates (WERs) of the system's output on simulated sets were comparable to that of bidirectional long short-term memory (BLSTM) GEV beamformer. However, the system showed relatively high WERs on real sets, achieving relative error rate reduction (RERR) of 14.3% over noisy signal on real evaluation set.
ISSN
2308-457X
URI
https://hdl.handle.net/10371/186227
DOI
https://doi.org/10.21437/Interspeech.2019-2397
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share