Channel-Attention Dense U-Net for Multichannel Speech Enhancement
- URL: http://arxiv.org/abs/2001.11542v1
- Date: Thu, 30 Jan 2020 19:56:52 GMT
- Title: Channel-Attention Dense U-Net for Multichannel Speech Enhancement
- Authors: Bahareh Tolooshams, Ritwik Giri, Andrew H. Song, Umut Isik, Arvindh
Krishnaswamy
- Abstract summary: We introduce a channel-attention mechanism inside the deep architecture to mimic beamforming.
We demonstrate the superior performance of the network against the state-of-the-art approaches on the CHiME-3 dataset.
- Score: 21.94418736688929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised deep learning has gained significant attention for speech
enhancement recently. The state-of-the-art deep learning methods perform the
task by learning a ratio/binary mask that is applied to the mixture in the
time-frequency domain to produce the clean speech. Despite the great
performance in the single-channel setting, these frameworks lag in performance
in the multichannel setting as the majority of these methods a) fail to exploit
the available spatial information fully, and b) still treat the deep
architecture as a black box which may not be well-suited for multichannel audio
processing. This paper addresses these drawbacks, a) by utilizing complex ratio
masking instead of masking on the magnitude of the spectrogram, and more
importantly, b) by introducing a channel-attention mechanism inside the deep
architecture to mimic beamforming. We propose Channel-Attention Dense U-Net, in
which we apply the channel-attention unit recursively on feature maps at every
layer of the network, enabling the network to perform non-linear beamforming.
We demonstrate the superior performance of the network against the
state-of-the-art approaches on the CHiME-3 dataset.
Related papers
- TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising [94.09442506816724]
Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID)
We present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement.
For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution.
For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures.
arXiv Detail & Related papers (2024-04-11T15:39:10Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM
Neural Networks [3.730592618611028]
We use LSTMs to enhance spatial clustering based time-frequency masks.
We achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance.
We evaluate the intelligibility of the output of each system using word error rate from a Kaldi automatic speech recognizer.
arXiv Detail & Related papers (2020-12-02T22:29:29Z) - Channel-wise Knowledge Distillation for Dense Prediction [73.99057249472735]
We propose to align features channel-wise between the student and teacher networks.
We consistently achieve superior performance on three benchmarks with various network structures.
arXiv Detail & Related papers (2020-11-26T12:00:38Z) - Deep Denoising Neural Network Assisted Compressive Channel Estimation
for mmWave Intelligent Reflecting Surfaces [99.34306447202546]
This paper proposes a deep denoising neural network assisted compressive channel estimation for mmWave IRS systems.
We first introduce a hybrid passive/active IRS architecture, where very few receive chains are employed to estimate the uplink user-to-IRS channels.
The complete channel matrix can be reconstructed from the limited measurements based on compressive sensing.
arXiv Detail & Related papers (2020-06-03T12:18:57Z) - Neural Speech Separation Using Spatially Distributed Microphones [19.242927805448154]
This paper proposes a neural network based speech separation method using spatially distributed microphones.
Unlike with traditional microphone array settings, neither the number of microphones nor their spatial arrangement is known in advance.
Speech recognition experimental results show that the proposed method significantly outperforms baseline multi-channel speech separation systems.
arXiv Detail & Related papers (2020-04-28T17:16:31Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.