Spatially Selective Deep Non-linear Filters for Speaker Extraction
- URL: http://arxiv.org/abs/2211.02420v2
- Date: Thu, 23 Mar 2023 08:31:34 GMT
- Title: Spatially Selective Deep Non-linear Filters for Speaker Extraction
- Authors: Kristina Tesch, Timo Gerkmann
- Abstract summary: We develop a deep joint spatial-spectral non-linear filter that can be steered in an arbitrary target direction.
We show that this scheme is more effective than the baseline approach and increases the flexibility of the filter at no performance cost.
- Score: 21.422488450492434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a scenario with multiple persons talking simultaneously, the spatial
characteristics of the signals are the most distinct feature for extracting the
target signal. In this work, we develop a deep joint spatial-spectral
non-linear filter that can be steered in an arbitrary target direction. For
this we propose a simple and effective conditioning mechanism, which sets the
initial state of the filter's recurrent layers based on the target direction.
We show that this scheme is more effective than the baseline approach and
increases the flexibility of the filter at no performance cost. The resulting
spatially selective non-linear filters can also be used for speech separation
of an arbitrary number of speakers and enable very accurate multi-speaker
localization as we demonstrate in this paper.
Related papers
- Phononic materials with effectively scale-separated hierarchical features using interpretable machine learning [57.91994916297646]
Architected hierarchical phononic materials have sparked promise tunability of elastodynamic waves and vibrations over multiple frequency ranges.
In this article, hierarchical unit-cells are obtained, where features at each length scale result in a band gap within a targeted frequency range.
Our approach offers a flexible and efficient method for the exploration of new regions in the hierarchical design space.
arXiv Detail & Related papers (2024-08-15T21:35:06Z) - Attention-Driven Multichannel Speech Enhancement in Moving Sound Source
Scenarios [11.811571392419324]
Speech enhancement algorithms typically assume a stationary sound source, a common mismatch with reality that limits their performance in real-world scenarios.
This paper focuses on attention-driven spatial filtering techniques designed for dynamic settings.
arXiv Detail & Related papers (2023-12-17T16:12:35Z) - Multi-channel Speech Separation Using Spatially Selective Deep
Non-linear Filters [21.672683390080106]
In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture.
We propose a deep neural network based spatially selective filter (SSF) that can be spatially steered to extract the speaker of interest.
arXiv Detail & Related papers (2023-04-24T11:44:00Z) - Insights into Deep Non-linear Filters for Improved Multi-channel Speech
Enhancement [21.422488450492434]
In a traditional setting, linear spatial filtering (beamforming) and single-channel post-filtering are commonly performed separately.
There is a trend towards employing deep neural networks (DNNs) to learn a joint spatial and tempo-spectral non-linear filter.
arXiv Detail & Related papers (2022-06-27T13:54:14Z) - On the Role of Spatial, Spectral, and Temporal Processing for DNN-based
Non-linear Multi-channel Speech Enhancement [18.133635752982105]
Using deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages.
Non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model.
Joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.
arXiv Detail & Related papers (2022-06-22T15:42:44Z) - Computational Doob's h-transforms for Online Filtering of Discretely
Observed Diffusions [65.74069050283998]
We propose a computational framework to approximate Doob's $h$-transforms.
The proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters.
arXiv Detail & Related papers (2022-06-07T15:03:05Z) - Combinations of Adaptive Filters [38.0505909175152]
Combination of adaptive filters exploits divide and conquer principle.
In particular, the problem of combining the outputs of several learning algorithms has been studied in the computational learning field.
arXiv Detail & Related papers (2021-12-22T22:21:43Z) - Learning Versatile Convolution Filters for Efficient Visual Recognition [125.34595948003745]
This paper introduces versatile filters to construct efficient convolutional neural networks.
We conduct theoretical analysis on network complexity and an efficient convolution scheme is introduced.
Experimental results on benchmark datasets and neural networks demonstrate that our versatile filters are able to achieve comparable accuracy as that of original filters.
arXiv Detail & Related papers (2021-09-20T06:07:14Z) - Unsharp Mask Guided Filtering [53.14430987860308]
The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering.
We propose a new and simplified formulation of the guided filter inspired by unsharp masking.
Our formulation enjoys a filtering prior to a low-pass filter and enables explicit structure transfer by estimating a single coefficient.
arXiv Detail & Related papers (2021-06-02T19:15:34Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.