A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
- URL: http://arxiv.org/abs/2409.12444v1
- Date: Thu, 19 Sep 2024 03:52:50 GMT
- Title: A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
- Authors: Jingyuan Wang, Jie Zhang, Shihao Chen, Miao Sun,
- Abstract summary: Binaural speech enhancement aims to improve the speech quality and intelligibility of noisy signals received by hearing devices.
Existing methods often suffer from the compromise between noise reduction (NR) capacity and spatial cues ( SCP) accuracy and preservation.
We present a learning-based lightweight complex convolutional network (LBCCN) which excels in NR by filtering low-frequency bands and keeping the rest.
- Score: 19.384404014248762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binaural speech enhancement (BSE) aims to jointly improve the speech quality and intelligibility of noisy signals received by hearing devices and preserve the spatial cues of the target for natural listening. Existing methods often suffer from the compromise between noise reduction (NR) capacity and spatial cues preservation (SCP) accuracy and a high computational demand in complex acoustic scenes. In this work, we present a learning-based lightweight binaural complex convolutional network (LBCCN), which excels in NR by filtering low-frequency bands and keeping the rest. Additionally, our approach explicitly incorporates the estimation of interchannel relative acoustic transfer function to ensure the spatial cues fidelity and speech clarity. Results show that the proposed LBCCN can achieve a comparable NR performance to state-of-the-art methods under various noise conditions, but with a much lower computational cost and a better SCP. The reproducible code and audio examples are available at https://github.com/jywanng/LBCCN.
Related papers
- Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields [61.07542274267568]
This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene.
Driven by the unique properties of RIR, we design a temporal correlation module and multi-scale energy decay criterion.
Experimental results show that NACF outperforms existing field-based methods by a notable margin.
arXiv Detail & Related papers (2023-09-27T19:50:50Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Deep Neural Mel-Subband Beamformer for In-car Speech Separation [44.58289679847228]
We propose a DL-based melband-based beamformer to perform speech separation in a car environment.
As opposed to conventional subband approaches, our framework uses a melband based sub selection strategy.
We find that our proposed framework achieves better separation performance over all SB and FB approaches.
arXiv Detail & Related papers (2022-11-22T21:11:26Z) - Zero-shot Blind Image Denoising via Implicit Neural Representations [77.79032012459243]
We propose an alternative denoising strategy that leverages the architectural inductive bias of implicit neural representations (INRs)
We show that our method outperforms existing zero-shot denoising methods under an extensive set of low-noise or real-noise scenarios.
arXiv Detail & Related papers (2022-04-05T12:46:36Z) - A Novel Speech Intelligibility Enhancement Model based on
CanonicalCorrelation and Deep Learning [12.913738983870621]
We present a canonical correlation based short-time objective intelligibility (CC-STOI) cost function to train a fully convolutional neural network (FCN) model.
We show that our CC-STOI based speech enhancement framework outperforms state-of-the-art DL models trained with conventional distance-based and STOI-based loss functions.
arXiv Detail & Related papers (2022-02-11T16:48:41Z) - Towards Lightweight Controllable Audio Synthesis with Conditional
Implicit Neural Representations [10.484851004093919]
Implicit neural representations (INRs) are neural networks used to approximate low-dimensional functions.
In this work we shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in generative frameworks for audio synthesis.
arXiv Detail & Related papers (2021-11-14T13:36:18Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - Using deep learning to understand and mitigate the qubit noise
environment [0.0]
We propose to address the challenge of extracting accurate noise spectra from time-dynamics measurements on qubits.
We demonstrate a neural network based methodology that allows for extraction of the noise spectrum associated with any qubit surrounded by an arbitrary bath.
Our results can be applied to a wide range of qubit platforms and provide a framework for improving qubit performance.
arXiv Detail & Related papers (2020-05-03T17:13:14Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.