A Mask Free Neural Network for Monaural Speech Enhancement
- URL: http://arxiv.org/abs/2306.04286v1
- Date: Wed, 7 Jun 2023 09:39:07 GMT
- Title: A Mask Free Neural Network for Monaural Speech Enhancement
- Authors: Liang Liu, Haixin Guan, Jinlong Ma, Wei Dai, Guangyong Wang, Shaowei
Ding
- Abstract summary: We propose the MFNet, a direct and simple network that can not only map speech but also map reverse noise.
Our experimental results demonstrate that our network using mapping method outperforms masking methods.
- Score: 5.773867150765472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In speech enhancement, the lack of clear structural characteristics in the
target speech phase requires the use of conservative and cumbersome network
frameworks. It seems difficult to achieve competitive performance using direct
methods and simple network architectures. However, we propose the MFNet, a
direct and simple network that can not only map speech but also map reverse
noise. This network is constructed by stacking global local former blocks
(GLFBs), which combine the advantages of Mobileblock for global processing and
Metaformer architecture for local interaction. Our experimental results
demonstrate that our network using mapping method outperforms masking methods,
and direct mapping of reverse noise is the optimal solution in strong noise
environments. In a horizontal comparison on the 2020 Deep Noise Suppression
(DNS) challenge test set without reverberation, to the best of our knowledge,
MFNet is the current state-of-the-art (SOTA) mapping model.
Related papers
- TOPIQ: A Top-down Approach from Semantics to Distortions for Image
Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks.
We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions.
A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z) - Complementary Random Masking for RGB-Thermal Semantic Segmentation [63.93784265195356]
RGB-thermal semantic segmentation is a potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions.
This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities.
We achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-30T13:57:21Z) - Parallel Gated Neural Network With Attention Mechanism For Speech
Enhancement [0.0]
This paper proposes a novel monaural speech enhancement system, consisting of a Feature Extraction Block (FEB), a Compensation Enhancement Block (ComEB) and a Mask Block (MB)
Experiments are conducted on the Librispeech dataset and results show that the proposed model obtains better performance than recent models in terms of ESTOI and PESQ scores.
arXiv Detail & Related papers (2022-10-26T06:42:19Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - Time-Domain Mapping Based Single-Channel Speech Separation With
Hierarchical Constraint Training [10.883458728718047]
Single-channel speech separation is required for multi-speaker speech recognition.
Recent deep learning-based approaches focused on time-domain audio separation net (TasNet)
We introduce attention augmented DPRNN (AttnAugDPRNN) which directly approximates the clean sources from the mixture for speech separation.
arXiv Detail & Related papers (2021-10-20T14:42:50Z) - Spatiotemporal Graph Neural Network based Mask Reconstruction for Video
Object Segmentation [70.97625552643493]
This paper addresses the task of segmenting class-agnostic objects in semi-supervised setting.
We propose a novel graph neuralS network (TG-Net) which captures the local contexts by utilizing all proposals.
arXiv Detail & Related papers (2020-12-10T07:57:44Z) - Rethinking FUN: Frequency-Domain Utilization Networks [21.10493050675827]
We present FUN, a family of novel Frequency-domain Utilization Networks.
These networks utilize the inherent efficiency of the frequency-domain by working directly in that domain.
We show that working in frequency domain allows for dynamic compression of the input at inference time without any explicit change to the architecture.
arXiv Detail & Related papers (2020-12-06T19:16:37Z) - Contextual Interference Reduction by Selective Fine-Tuning of Neural
Networks [1.0152838128195465]
We study the role of the context on interfering with a disentangled foreground target object representation.
We work on a framework that benefits from the bottom-up and top-down processing paradigms.
arXiv Detail & Related papers (2020-11-21T20:11:12Z) - Channel-Attention Dense U-Net for Multichannel Speech Enhancement [21.94418736688929]
We introduce a channel-attention mechanism inside the deep architecture to mimic beamforming.
We demonstrate the superior performance of the network against the state-of-the-art approaches on the CHiME-3 dataset.
arXiv Detail & Related papers (2020-01-30T19:56:52Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.