DeepFilterNet: A Low Complexity Speech Enhancement Framework for
Full-Band Audio based on Deep Filtering
- URL: http://arxiv.org/abs/2110.05588v1
- Date: Mon, 11 Oct 2021 20:03:52 GMT
- Title: DeepFilterNet: A Low Complexity Speech Enhancement Framework for
Full-Band Audio based on Deep Filtering
- Authors: Hendrik Schr\"oter, Alberto N. Escalante-B., Tobias Rosenkranz,
Andreas Maier
- Abstract summary: We propose DeepFilterNet, a two stage speech enhancement framework utilizing deep filtering.
First, we enhance the spectral envelope using ERB-scaled gains modeling the human frequency perception.
The second stage employs deep filtering to enhance the periodic components of speech.
- Score: 9.200520879361916
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Complex-valued processing has brought deep learning-based speech enhancement
and signal extraction to a new level. Typically, the process is based on a
time-frequency (TF) mask which is applied to a noisy spectrogram, while complex
masks (CM) are usually preferred over real-valued masks due to their ability to
modify the phase. Recent work proposed to use a complex filter instead of a
point-wise multiplication with a mask. This allows to incorporate information
from previous and future time steps exploiting local correlations within each
frequency band. In this work, we propose DeepFilterNet, a two stage speech
enhancement framework utilizing deep filtering. First, we enhance the spectral
envelope using ERB-scaled gains modeling the human frequency perception. The
second stage employs deep filtering to enhance the periodic components of
speech. Additionally to taking advantage of perceptual properties of speech, we
enforce network sparsity via separable convolutions and extensive grouping in
linear and recurrent layers to design a low complexity architecture. We further
show that our two stage deep filtering approach outperforms complex masks over
a variety of frequency resolutions and latencies and demonstrate convincing
performance compared to other state-of-the-art models.
Related papers
- FilterNet: Harnessing Frequency Filters for Time Series Forecasting [34.83702192033196]
FilterNet is built upon our proposed learnable frequency filters to extract key informative temporal patterns by selectively passing or attenuating certain components of time series signals.
equipped with the two filters, FilterNet can approximately surrogate the linear and attention mappings widely adopted in time series literature.
arXiv Detail & Related papers (2024-11-03T16:20:41Z) - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement [10.662665274373387]
We present a real-time speech enhancement demo using DeepFilterNet.
Our model is able to match state-of-the-art speech enhancement benchmarks while achieving a real-time-factor of 0.19 on a single threaded notebook CPU.
arXiv Detail & Related papers (2023-05-14T19:09:35Z) - Extending DNN-based Multiplicative Masking to Deep Subband Filtering for
Improved Dereverberation [15.16865739526702]
We present a scheme for extending deep neural network-based multiplicative maskers to deep subband filters for speech restoration in the time-frequency domain.
The resulting method can be generically applied to any deep neural network providing masks in the time-frequency domain.
arXiv Detail & Related papers (2023-03-01T14:10:21Z) - Parallel Gated Neural Network With Attention Mechanism For Speech
Enhancement [0.0]
This paper proposes a novel monaural speech enhancement system, consisting of a Feature Extraction Block (FEB), a Compensation Enhancement Block (ComEB) and a Mask Block (MB)
Experiments are conducted on the Librispeech dataset and results show that the proposed model obtains better performance than recent models in terms of ESTOI and PESQ scores.
arXiv Detail & Related papers (2022-10-26T06:42:19Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - PINs: Progressive Implicit Networks for Multi-Scale Neural
Representations [68.73195473089324]
We propose a progressive positional encoding, exposing a hierarchical structure to incremental sets of frequency encodings.
Our model accurately reconstructs scenes with wide frequency bands and learns a scene representation at progressive level of detail.
Experiments on several 2D and 3D datasets show improvements in reconstruction accuracy, representational capacity and training speed compared to baselines.
arXiv Detail & Related papers (2022-02-09T20:33:37Z) - Learning Versatile Convolution Filters for Efficient Visual Recognition [125.34595948003745]
This paper introduces versatile filters to construct efficient convolutional neural networks.
We conduct theoretical analysis on network complexity and an efficient convolution scheme is introduced.
Experimental results on benchmark datasets and neural networks demonstrate that our versatile filters are able to achieve comparable accuracy as that of original filters.
arXiv Detail & Related papers (2021-09-20T06:07:14Z) - Unsharp Mask Guided Filtering [53.14430987860308]
The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering.
We propose a new and simplified formulation of the guided filter inspired by unsharp masking.
Our formulation enjoys a filtering prior to a low-pass filter and enables explicit structure transfer by estimating a single coefficient.
arXiv Detail & Related papers (2021-06-02T19:15:34Z) - Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming
Networks [6.82469220191368]
We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement.
MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks.
arXiv Detail & Related papers (2020-08-17T12:18:34Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.