Insights into Deep Non-linear Filters for Improved Multi-channel Speech
Enhancement
- URL: http://arxiv.org/abs/2206.13310v1
- Date: Mon, 27 Jun 2022 13:54:14 GMT
- Title: Insights into Deep Non-linear Filters for Improved Multi-channel Speech
Enhancement
- Authors: Kristina Tesch, Timo Gerkmann
- Abstract summary: In a traditional setting, linear spatial filtering (beamforming) and single-channel post-filtering are commonly performed separately.
There is a trend towards employing deep neural networks (DNNs) to learn a joint spatial and tempo-spectral non-linear filter.
- Score: 21.422488450492434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The key advantage of using multiple microphones for speech enhancement is
that spatial filtering can be used to complement the tempo-spectral processing.
In a traditional setting, linear spatial filtering (beamforming) and
single-channel post-filtering are commonly performed separately. In contrast,
there is a trend towards employing deep neural networks (DNNs) to learn a joint
spatial and tempo-spectral non-linear filter, which means that the restriction
of a linear processing model and that of a separate processing of spatial and
tempo-spectral information can potentially be overcome. However, the internal
mechanisms that lead to good performance of such data-driven filters for
multi-channel speech enhancement are not well understood. Therefore, in this
work, we analyse the properties of a non-linear spatial filter realized by a
DNN as well as its interdependency with temporal and spectral processing by
carefully controlling the information sources (spatial, spectral, and temporal)
available to the network. We confirm the superiority of a non-linear spatial
processing model, which outperforms an oracle linear spatial filter in a
challenging speaker extraction scenario for a low number of microphones by 0.24
POLQA score. Our analyses reveal that in particular spectral information should
be processed jointly with spatial information as this increases the spatial
selectivity of the filter. Our systematic evaluation then leads to a simple
network architecture, that outperforms state-of-the-art network architectures
on a speaker extraction task by 0.22 POLQA score and by 0.32 POLQA score on the
CHiME3 data.
Related papers
- An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical Systems [10.997994515823798]
We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems.
Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation.
EnSF provides surprising performance, compared with the state-of-the-art Local Ensemble Transform Kalman Filter method.
arXiv Detail & Related papers (2023-09-02T16:48:02Z) - Multi-channel Speech Separation Using Spatially Selective Deep
Non-linear Filters [21.672683390080106]
In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture.
We propose a deep neural network based spatially selective filter (SSF) that can be spatially steered to extract the speaker of interest.
arXiv Detail & Related papers (2023-04-24T11:44:00Z) - Spatially Selective Deep Non-linear Filters for Speaker Extraction [21.422488450492434]
We develop a deep joint spatial-spectral non-linear filter that can be steered in an arbitrary target direction.
We show that this scheme is more effective than the baseline approach and increases the flexibility of the filter at no performance cost.
arXiv Detail & Related papers (2022-11-04T12:54:06Z) - On the Role of Spatial, Spectral, and Temporal Processing for DNN-based
Non-linear Multi-channel Speech Enhancement [18.133635752982105]
Using deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages.
Non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model.
Joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.
arXiv Detail & Related papers (2022-06-22T15:42:44Z) - Computational Doob's h-transforms for Online Filtering of Discretely
Observed Diffusions [65.74069050283998]
We propose a computational framework to approximate Doob's $h$-transforms.
The proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters.
arXiv Detail & Related papers (2022-06-07T15:03:05Z) - Three-Way Deep Neural Network for Radio Frequency Map Generation and
Source Localization [67.93423427193055]
Monitoring wireless spectrum over spatial, temporal, and frequency domains will become a critical feature in beyond-5G and 6G communication technologies.
In this paper, we present a Generative Adversarial Network (GAN) machine learning model to interpolate irregularly distributed measurements across the spatial domain.
arXiv Detail & Related papers (2021-11-23T22:25:10Z) - Learning Versatile Convolution Filters for Efficient Visual Recognition [125.34595948003745]
This paper introduces versatile filters to construct efficient convolutional neural networks.
We conduct theoretical analysis on network complexity and an efficient convolution scheme is introduced.
Experimental results on benchmark datasets and neural networks demonstrate that our versatile filters are able to achieve comparable accuracy as that of original filters.
arXiv Detail & Related papers (2021-09-20T06:07:14Z) - Sparsistent filtering of comovement networks from high-dimensional data [0.0]
We introduce a new technique to filter large dimensional networks out of dynamical behavior of the constituent nodes.
As opposed to the well known network filters that rely on preserving key topological properties of the realized network, our method treats the spectrum as the fundamental object and preserves spectral properties.
arXiv Detail & Related papers (2021-01-22T15:44:41Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.