DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
- URL: http://arxiv.org/abs/2501.10525v2
- Date: Thu, 23 Jan 2025 14:44:51 GMT
- Title: DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
- Authors: Iosif Tsangko, Andreas Triantafyllopoulos, Michael Müller, Hendrik Schröter, Björn W. Schuller,
- Abstract summary: The DeepFilterNet (DFN) architecture was recently proposed as a deep learning model suited for hearing aid devices.
We introduce these principles to the DFN model, thus proposing the DFingerNet (DFiN) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.
- Score: 41.294460006431564
- License:
- Abstract: The DeepFilterNet (DFN) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all' approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the DFN model, thus proposing the DFingerNet (DFiN) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.
Related papers
- Enhance Vision-Language Alignment with Noise [59.2608298578913]
We investigate whether the frozen model can be fine-tuned by customized noise.
We propose Positive-incentive Noise (PiNI) which can fine-tune CLIP via injecting noise into both visual and text encoders.
arXiv Detail & Related papers (2024-12-14T12:58:15Z) - CheapNET: Improving Light-weight speech enhancement network by projected
loss function [0.8192907805418583]
We introduce a novel projection loss function, diverging from MSE, to enhance noise suppression.
For echo cancellation, the function enables direct predictions on LAEC pre-processed outputs.
Our noise suppression model achieves near state-of-the-art results with only 3.1M parameters and 0.4GFlops/s computational load.
arXiv Detail & Related papers (2023-11-27T16:03:42Z) - Unsupervised speech enhancement with deep dynamical generative speech
and noise models [26.051535142743166]
This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.
We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both.
arXiv Detail & Related papers (2023-06-13T14:52:35Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - CDLNet: Noise-Adaptive Convolutional Dictionary Learning Network for
Blind Denoising and Demosaicing [4.975707665155918]
Unrolled optimization networks present an interpretable alternative to constructing deep neural networks.
We propose an unrolled convolutional dictionary learning network (CDLNet) and demonstrate its competitive denoising and demosaicing (JDD) performance.
Specifically, we show that the proposed model outperforms state-of-the-art fully convolutional denoising and JDD models when scaled to a similar parameter count.
arXiv Detail & Related papers (2021-12-02T01:23:21Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - CDLNet: Robust and Interpretable Denoising Through Deep Convolutional
Dictionary Learning [6.6234935958112295]
Unrolled optimization networks propose an interpretable alternative to constructing deep neural networks.
We show that the proposed model outperforms the state-of-the-art denoising models when scaled to similar parameter count.
arXiv Detail & Related papers (2021-03-05T01:15:59Z) - Variational Autoencoder for Speech Enhancement with a Noise-Aware
Encoder [30.318947721658862]
We propose to include noise information in the training phase by using a noise-aware encoder trained on noisy-clean speech pairs.
We show that our proposed noise-aware VAE outperforms the standard VAE in terms of overall distortion without increasing the number of model parameters.
arXiv Detail & Related papers (2021-02-17T11:40:42Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.