Deep Active Speech Cancellation with Multi-Band Mamba Network
- URL: http://arxiv.org/abs/2502.01185v1
- Date: Mon, 03 Feb 2025 09:22:26 GMT
- Title: Deep Active Speech Cancellation with Multi-Band Mamba Network
- Authors: Yehuda Mishaly, Lior Wolf, Eliya Nachmani,
- Abstract summary: We present a novel deep learning network for Active Speech Cancellation (ASC)
The proposed Multi-Band Mamba architecture segments input audio into distinct frequency bands, enabling precise anti-signal generation.
Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC.
- Score: 62.73250985838971
- License:
- Abstract: We present a novel deep learning network for Active Speech Cancellation (ASC), advancing beyond Active Noise Cancellation (ANC) methods by effectively canceling both noise and speech signals. The proposed Multi-Band Mamba architecture segments input audio into distinct frequency bands, enabling precise anti-signal generation and improved phase alignment across frequencies. Additionally, we introduce an optimization-driven loss function that provides near-optimal supervisory signals for anti-signal generation. Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC, significantly outperforming existing methods. Audio samples are available at https://mishalydev.github.io/DeepASC-Demo
Related papers
- Unsupervised CP-UNet Framework for Denoising DAS Data with Decay Noise [13.466125373185399]
Distributed acoustic sensor (DAS) technology leverages optical fiber cables to detect acoustic signals.
DAS exhibits a lower signal-to-noise ratio (S/N) compared to geophones.
This reduced S/N can negatively impact data analyses containing inversion and interpretation.
arXiv Detail & Related papers (2025-02-19T03:09:49Z) - DenoMAE: A Multimodal Autoencoder for Denoising Modulation Signals [21.25974800554959]
DenoMAE is a novel framework for denoising modulation signals during pretraining.
It incorporates multiple input modalities, including noise, to enhance cross-modal learning.
It achieves state-of-the-art accuracy in automatic modulation classification tasks.
arXiv Detail & Related papers (2025-01-20T15:23:16Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - AMC-Net: An Effective Network for Automatic Modulation Classification [22.871024969842335]
We propose a novel AMC-Net that improves recognition by denoising the input signal in the frequency domain while performing multi-scale and effective feature extraction.
Experiments on two representative datasets demonstrate that our model performs better in efficiency and effectiveness than the most current methods.
arXiv Detail & Related papers (2023-04-02T04:26:30Z) - Unifying Speech Enhancement and Separation with Gradient Modulation for
End-to-End Noise-Robust Speech Separation [23.758202121043805]
We propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.
Experimental results show that our approach achieves the state-of-the-art on large-scale Libri2Mix- and Libri3Mix-noisy datasets.
arXiv Detail & Related papers (2023-02-22T03:54:50Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - Digital Beamforming Robust to Time-Varying Carrier Frequency Offset [21.18926642388997]
We present novel beamforming algorithms that are robust to signal corruptions arising from a time-variant carrier frequency offset.
We propose two atomic-norm-minimization (ANM)-based methods to design a weight vector that can be used to cancel interference when there exist unknown time-varying frequency drift in the pilot and interferer signals.
arXiv Detail & Related papers (2021-03-08T18:08:56Z) - CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile
Application [63.2243126704342]
This study presents a deep learning-based speech signal-processing mobile application known as CITISEN.
The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC)
Compared with the noisy speech signals, the enhanced speech signals achieved about 6% and 33% of improvements.
arXiv Detail & Related papers (2020-08-21T02:04:12Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.