FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for
Speech Enhancement
- URL: http://arxiv.org/abs/2203.12188v1
- Date: Wed, 23 Mar 2022 04:33:09 GMT
- Title: FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for
Speech Enhancement
- Authors: Jun Chen, Zilin Wang, Deyi Tuo, Zhiyong Wu, Shiyin Kang, Helen Meng
- Abstract summary: We propose an extended single-channel real-time speech enhancement framework called FullSubNet+.
The experimental results in DNS Challenge dataset show the superior performance of our FullSubNet+.
- Score: 43.477179521051355
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Previously proposed FullSubNet has achieved outstanding performance in Deep
Noise Suppression (DNS) Challenge and attracted much attention. However, it
still encounters issues such as input-output mismatch and coarse processing for
frequency bands. In this paper, we propose an extended single-channel real-time
speech enhancement framework called FullSubNet+ with following significant
improvements. First, we design a lightweight multi-scale time sensitive channel
attention (MulCA) module which adopts multi-scale convolution and channel
attention mechanism to help the network focus on more discriminative frequency
bands for noise reduction. Then, to make full use of the phase information in
noisy speech, our model takes all the magnitude, real and imaginary
spectrograms as inputs. Moreover, by replacing the long short-term memory
(LSTM) layers in original full-band model with stacked temporal convolutional
network (TCN) blocks, we design a more efficient full-band module called
full-band extractor. The experimental results in DNS Challenge dataset show the
superior performance of our FullSubNet+, which reaches the state-of-the-art
(SOTA) performance and outperforms other existing speech enhancement
approaches.
Related papers
- LMFCA-Net: A Lightweight Model for Multi-Channel Speech Enhancement with Efficient Narrow-Band and Cross-Band Attention [4.489833733302935]
This paper presents a lightweight multi-channel speech enhancement network with decoupled fully connected attention (LMFCA-Net)
The proposed LMFCA-Net introduces time-axis decoupled fully-connected attention (T-FCA) and frequency-axis decoupled fully-connected attention (F-FCA) mechanisms to effectively capture long-range narrow-band and cross-band information without recurrent units.
arXiv Detail & Related papers (2025-02-17T05:42:03Z) - Deep Active Speech Cancellation with Multi-Band Mamba Network [62.73250985838971]
We present a novel deep learning network for Active Speech Cancellation (ASC)
The proposed Multi-Band Mamba architecture segments input audio into distinct frequency bands, enabling precise anti-signal generation.
Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC.
arXiv Detail & Related papers (2025-02-03T09:22:26Z) - Cascaded Temporal Updating Network for Efficient Video Super-Resolution [47.63267159007611]
Key components in recurrent-based VSR networks significantly impact model efficiency.
We propose a cascaded temporal updating network (CTUN) for efficient VSR.
CTUN achieves a favorable trade-off between efficiency and performance compared to existing methods.
arXiv Detail & Related papers (2024-08-26T12:59:32Z) - VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders [14.222389985736422]
VNet is a GAN-based neural vocoder network that incorporates full-band spectral information.
We demonstrate that the VNet model is capable of generating high-fidelity speech.
arXiv Detail & Related papers (2024-08-13T14:00:02Z) - TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising [94.09442506816724]
Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID)
We present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement.
For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution.
For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures.
arXiv Detail & Related papers (2024-04-11T15:39:10Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - Multi-Frequency Information Enhanced Channel Attention Module for
Speaker Representation Learning [41.44950556040058]
We propose to utilize multi-frequency information and design two novel and effective attention modules.
The proposed attention modules can effectively capture more speaker information from multiple frequency components on the basis of DCT.
Experimental results demonstrate that our proposed SFSC and MFSC attention modules can efficiently generate more discriminative speaker representations.
arXiv Detail & Related papers (2022-07-10T21:19:36Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - Dynamic Slimmable Denoising Network [64.77565006158895]
Dynamic slimmable denoising network (DDSNet) is a general method to achieve good denoising quality with less computational complexity.
OurNet is empowered with the ability of dynamic inference by a dynamic gate.
Our experiments demonstrate our-Net consistently outperforms the state-of-the-art individually trained static denoising networks.
arXiv Detail & Related papers (2021-10-17T22:45:33Z) - PRVNet: A Novel Partially-Regularized Variational Autoencoders for
Massive MIMO CSI Feedback [15.972209500908642]
In a multiple-input multiple-output frequency-division duplexing (MIMO-FDD) system, the user equipment (UE) sends the downlink channel state information (CSI) to the base station to report link status.
In this paper, we introduce PRVNet, a neural network architecture inspired by variational autoencoders (VAE) to compress the CSI matrix before sending it back to the base station.
arXiv Detail & Related papers (2020-11-09T04:07:45Z) - Channel-Attention Dense U-Net for Multichannel Speech Enhancement [21.94418736688929]
We introduce a channel-attention mechanism inside the deep architecture to mimic beamforming.
We demonstrate the superior performance of the network against the state-of-the-art approaches on the CHiME-3 dataset.
arXiv Detail & Related papers (2020-01-30T19:56:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.