Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive
Feature Learning in Speech Enhancement
- URL: http://arxiv.org/abs/2306.05861v1
- Date: Fri, 9 Jun 2023 12:52:01 GMT
- Title: Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive
Feature Learning in Speech Enhancement
- Authors: Junyu Wang
- Abstract summary: This paper proposes a time-frequency (T-F) domain speech enhancement network (DPCFCS-Net)
It incorporates improved densely connected blocks, dual-path modules, convolution-augmented transformers (conformers), channel attention, and spatial attention.
Compared with previous models, our proposed model has a more efficient encoder-decoder and can learn comprehensive features.
- Score: 0.2538209532048866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current speech enhancement (SE) research has largely neglected channel
attention and spatial attention, and encoder-decoder architecture-based
networks have not adequately considered how to provide efficient inputs to the
intermediate enhancement layer. To address these issues, this paper proposes a
time-frequency (T-F) domain SE network (DPCFCS-Net) that incorporates improved
densely connected blocks, dual-path modules, convolution-augmented transformers
(conformers), channel attention, and spatial attention. Compared with previous
models, our proposed model has a more efficient encoder-decoder and can learn
comprehensive features. Experimental results on the VCTK+DEMAND dataset
demonstrate that our method outperforms existing techniques in SE performance.
Furthermore, the improved densely connected block and two dimensions attention
module developed in this work are highly adaptable and easily integrated into
existing networks.
Related papers
- Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double Dynamics [15.847713094328286]
Integrated sensing and communication (ISAC) technology plays a crucial role in vehicular networks.
Double dynamics present significant challenges for real-time ISAC precoding design.
We propose a synesthesia of machine (SoM)-enhanced precoding paradigm.
arXiv Detail & Related papers (2024-08-24T10:35:10Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - An Efficient Speech Separation Network Based on Recurrent Fusion Dilated
Convolution and Channel Attention [0.2538209532048866]
We present an efficient speech separation neural network, ARFDCN, which combines dilated convolutions, multi-scale fusion (MSF), and channel attention.
Experimental results indicate that the model achieves a decent balance between performance and computational efficiency.
arXiv Detail & Related papers (2023-06-09T13:30:27Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - End-to-end Transformer for Compressed Video Quality Enhancement [21.967066471073462]
We propose a transformer-based compressed video quality enhancement (TVQE) method, consisting of Swin-AutoEncoder based Spatio-Temporal feature Fusion (SSTF) module and Channel-wise Attention based Quality Enhancement (CAQE) module.
Our proposed method outperforms existing ones in terms of both inference speed and GPU consumption.
arXiv Detail & Related papers (2022-10-25T08:12:05Z) - EMC2A-Net: An Efficient Multibranch Cross-channel Attention Network for
SAR Target Classification [10.479559839534033]
This paper proposed two residual blocks, namely EMC2A blocks with multiscale receptive fields(RFs), based on a multibranch structure and then designed an efficient isotopic architecture deep CNN (DCNN), EMC2A-Net.
EMC2A blocks utilize parallel dilated convolution with different dilation rates, which can effectively capture multiscale context features without significantly increasing the computational burden.
This paper proposed a multiscale feature cross-channel attention module, namely the EMC2A module, adopting a local multiscale feature interaction strategy without dimensionality reduction.
arXiv Detail & Related papers (2022-08-03T04:31:52Z) - Activating More Pixels in Image Super-Resolution Transformer [53.87533738125943]
Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution.
We propose a novel Hybrid Attention Transformer (HAT) to activate more input pixels for better reconstruction.
Our overall method significantly outperforms the state-of-the-art methods by more than 1dB.
arXiv Detail & Related papers (2022-05-09T17:36:58Z) - Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing
Vertical and Horizontal Convolutions [58.71117402626524]
We present a novel double-branch encoder architecture for medical image segmentation.
Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels.
The experiments validate the effectiveness of our model on four datasets.
arXiv Detail & Related papers (2021-07-24T02:58:32Z) - EfficientFCN: Holistically-guided Decoding for Semantic Segmentation [49.27021844132522]
State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN)
We propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution.
Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost.
arXiv Detail & Related papers (2020-08-24T14:48:23Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.