Parallel Gated Neural Network With Attention Mechanism For Speech
Enhancement
- URL: http://arxiv.org/abs/2210.14509v2
- Date: Thu, 27 Oct 2022 04:47:45 GMT
- Title: Parallel Gated Neural Network With Attention Mechanism For Speech
Enhancement
- Authors: Jianqiao Cui, Stefan Bleeck
- Abstract summary: This paper proposes a novel monaural speech enhancement system, consisting of a Feature Extraction Block (FEB), a Compensation Enhancement Block (ComEB) and a Mask Block (MB)
Experiments are conducted on the Librispeech dataset and results show that the proposed model obtains better performance than recent models in terms of ESTOI and PESQ scores.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning algorithm are increasingly used for speech enhancement (SE). In
supervised methods, global and local information is required for accurate
spectral mapping. A key restriction is often poor capture of key contextual
information. To leverage long-term for target speakers and compensate
distortions of cleaned speech, this paper adopts a sequence-to-sequence (S2S)
mapping structure and proposes a novel monaural speech enhancement system,
consisting of a Feature Extraction Block (FEB), a Compensation Enhancement
Block (ComEB) and a Mask Block (MB). In the FEB a U-net block is used to
extract abstract features using complex-valued spectra with one path to
suppress the background noise in the magnitude domain using masking methods and
the MB takes magnitude features from the FEBand compensates the lost
complex-domain features produced from ComEB to restore the final cleaned
speech. Experiments are conducted on the Librispeech dataset and results show
that the proposed model obtains better performance than recent models in terms
of ESTOI and PESQ scores.
Related papers
- Boosting Open-Vocabulary Object Detection by Handling Background Samples [9.07525578809556]
We propose a novel approach to address the limitations of CLIP in handling background samples.
We introduce Partial Object Suppression (POS) to address the issue of misclassifying partial regions as foreground.
Our proposed model is capable of achieving performance enhancements across various open-vocabulary detectors.
arXiv Detail & Related papers (2024-10-11T09:15:50Z) - Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - A Mask Free Neural Network for Monaural Speech Enhancement [5.773867150765472]
We propose the MFNet, a direct and simple network that can not only map speech but also map reverse noise.
Our experimental results demonstrate that our network using mapping method outperforms masking methods.
arXiv Detail & Related papers (2023-06-07T09:39:07Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models [57.20432226304683]
Non-autoregressive (NAR) modeling has gained more and more attention in speech processing.
We propose a novel end-to-end streaming NAR speech recognition system.
We show that the proposed method improves online ASR recognition in low latency conditions.
arXiv Detail & Related papers (2021-07-20T11:42:26Z) - Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming
Networks [6.82469220191368]
We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement.
MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks.
arXiv Detail & Related papers (2020-08-17T12:18:34Z) - Sparse Mixture of Local Experts for Efficient Speech Enhancement [19.645016575334786]
We investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks.
By splitting up the speech denoising task into non-overlapping subproblems, we are able to improve denoising performance while also reducing computational complexity.
Our findings demonstrate that a fine-tuned ensemble network is able to exceed the speech denoising capabilities of a generalist network.
arXiv Detail & Related papers (2020-05-16T23:23:22Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.