On the Role of Spatial, Spectral, and Temporal Processing for DNN-based
Non-linear Multi-channel Speech Enhancement
- URL: http://arxiv.org/abs/2206.11181v1
- Date: Wed, 22 Jun 2022 15:42:44 GMT
- Title: On the Role of Spatial, Spectral, and Temporal Processing for DNN-based
Non-linear Multi-channel Speech Enhancement
- Authors: Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann
- Abstract summary: Using deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages.
Non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model.
Joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.
- Score: 18.133635752982105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Employing deep neural networks (DNNs) to directly learn filters for
multi-channel speech enhancement has potentially two key advantages over a
traditional approach combining a linear spatial filter with an independent
tempo-spectral post-filter: 1) non-linear spatial filtering allows to overcome
potential restrictions originating from a linear processing model and 2) joint
processing of spatial and tempo-spectral information allows to exploit
interdependencies between different sources of information. A variety of
DNN-based non-linear filters have been proposed recently, for which good
enhancement performance is reported. However, little is known about the
internal mechanisms which turns network architecture design into a game of
chance. Therefore, in this paper, we perform experiments to better understand
the internal processing of spatial, spectral and temporal information by
DNN-based non-linear filters. On the one hand, our experiments in a difficult
speech extraction scenario confirm the importance of non-linear spatial
filtering, which outperforms an oracle linear spatial filter by 0.24 POLQA
score. On the other hand, we demonstrate that joint processing results in a
large performance gap of 0.4 POLQA score between network architectures
exploiting spectral versus temporal information besides spatial information.
Related papers
- An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical Systems [10.997994515823798]
We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems.
Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation.
EnSF provides surprising performance, compared with the state-of-the-art Local Ensemble Transform Kalman Filter method.
arXiv Detail & Related papers (2023-09-02T16:48:02Z) - Multi-channel Speech Separation Using Spatially Selective Deep
Non-linear Filters [21.672683390080106]
In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture.
We propose a deep neural network based spatially selective filter (SSF) that can be spatially steered to extract the speaker of interest.
arXiv Detail & Related papers (2023-04-24T11:44:00Z) - Spatially Selective Deep Non-linear Filters for Speaker Extraction [21.422488450492434]
We develop a deep joint spatial-spectral non-linear filter that can be steered in an arbitrary target direction.
We show that this scheme is more effective than the baseline approach and increases the flexibility of the filter at no performance cost.
arXiv Detail & Related papers (2022-11-04T12:54:06Z) - Insights into Deep Non-linear Filters for Improved Multi-channel Speech
Enhancement [21.422488450492434]
In a traditional setting, linear spatial filtering (beamforming) and single-channel post-filtering are commonly performed separately.
There is a trend towards employing deep neural networks (DNNs) to learn a joint spatial and tempo-spectral non-linear filter.
arXiv Detail & Related papers (2022-06-27T13:54:14Z) - Space-Time Graph Neural Networks [104.55175325870195]
We introduce space-time graph neural network (ST-GNN) to jointly process the underlying space-time topology of time-varying network data.
Our analysis shows that small variations in the network topology and time evolution of a system does not significantly affect the performance of ST-GNNs.
arXiv Detail & Related papers (2021-10-06T16:08:44Z) - Learning Versatile Convolution Filters for Efficient Visual Recognition [125.34595948003745]
This paper introduces versatile filters to construct efficient convolutional neural networks.
We conduct theoretical analysis on network complexity and an efficient convolution scheme is introduced.
Experimental results on benchmark datasets and neural networks demonstrate that our versatile filters are able to achieve comparable accuracy as that of original filters.
arXiv Detail & Related papers (2021-09-20T06:07:14Z) - DNN-Based Topology Optimisation: Spatial Invariance and Neural Tangent
Kernel [7.106986689736828]
We study the SIMP method with a density field generated by a fully-connected neural network, taking the coordinates as inputs.
We show that the use of DNNs leads to a filtering effect similar to traditional filtering techniques for SIMP, with a filter described by the Neural Tangent Kernel (NTK)
arXiv Detail & Related papers (2021-06-10T12:49:55Z) - Adaptive Latent Space Tuning for Non-Stationary Distributions [62.997667081978825]
We present a method for adaptive tuning of the low-dimensional latent space of deep encoder-decoder style CNNs.
We demonstrate our approach for predicting the properties of a time-varying charged particle beam in a particle accelerator.
arXiv Detail & Related papers (2021-05-08T03:50:45Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.