Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
- URL: http://arxiv.org/abs/2507.02791v2
- Date: Sat, 05 Jul 2025 07:54:28 GMT
- Title: Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance
- Authors: Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann,
- Abstract summary: We present a novel strategy utilizing a low-complexity tracking algorithm in the form of a particle filter instead.<n>We show how the autoregressive interplay between both algorithms drastically improves tracking accuracy and leads to strong enhancement performance.
- Score: 14.16697537117357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works on deep non-linear spatially selective filters demonstrate exceptional enhancement performance with computationally lightweight architectures for stationary speakers of known directions. However, to maintain this performance in dynamic scenarios, resource-intensive data-driven tracking algorithms become necessary to provide precise spatial guidance conditioned on the initial direction of a target speaker. As this additional computational overhead hinders application in resource-constrained scenarios such as real-time speech enhancement, we present a novel strategy utilizing a low-complexity tracking algorithm in the form of a particle filter instead. Assuming a causal, sequential processing style, we introduce temporal feedback to leverage the enhanced speech signal of the spatially selective filter to compensate for the limited modeling capabilities of the particle filter. Evaluation on a synthetic dataset illustrates how the autoregressive interplay between both algorithms drastically improves tracking accuracy and leads to strong enhancement performance. A listening test with real-world recordings complements these findings by indicating a clear trend towards our proposed self-steering pipeline as preferred choice over comparable methods.
Related papers
- Steering Deep Non-Linear Spatially Selective Filters for Weakly Guided Extraction of Moving Speakers in Dynamic Scenarios [15.736484513462973]
spatially dynamic scenarios are considerably more challenging due to time-varying spatial features and arising ambiguities.<n>We propose a weakly guided extraction method solely depending on the target's initial position to cope with spatial dynamic scenarios.<n>By incorporating our own deep tracking algorithm and developing a joint training strategy on a synthetic dataset, we demonstrate the proficiency of our approach in resolving spatial ambiguities.
arXiv Detail & Related papers (2025-05-20T15:43:55Z) - Event Signal Filtering via Probability Flux Estimation [58.31652473933809]
Events offer a novel paradigm for capturing scene dynamics via asynchronous sensing, but their inherent randomness often leads to degraded signal quality.<n>Event signal filtering is thus essential for enhancing fidelity by reducing this internal randomness and ensuring consistent outputs across diverse acquisition conditions.<n>This paper introduces a generative, online filtering framework called Event Density Flow Filter (EDFilter)<n>Experiments validate EDFilter's performance across tasks like event filtering, super-resolution, and direct event-based blob tracking.
arXiv Detail & Related papers (2025-04-10T07:03:08Z) - State Estimation Using Particle Filtering in Adaptive Machine Learning Methods: Integrating Q-Learning and NEAT Algorithms with Noisy Radar Measurements [0.8528368686417979]
We propose an integrated framework that unifies particle filtering with Q-learning and NEAT to explicitly address the challenge of noisy measurements.<n> Experiments on grid-based navigation and a simulated car environment highlight consistent gains in training stability, final performance, and success rates over baselines lacking advanced filtering.
arXiv Detail & Related papers (2025-04-10T02:20:45Z) - Attention-Driven Multichannel Speech Enhancement in Moving Sound Source
Scenarios [11.811571392419324]
Speech enhancement algorithms typically assume a stationary sound source, a common mismatch with reality that limits their performance in real-world scenarios.
This paper focuses on attention-driven spatial filtering techniques designed for dynamic settings.
arXiv Detail & Related papers (2023-12-17T16:12:35Z) - Spatially Selective Deep Non-linear Filters for Speaker Extraction [21.422488450492434]
We develop a deep joint spatial-spectral non-linear filter that can be steered in an arbitrary target direction.
We show that this scheme is more effective than the baseline approach and increases the flexibility of the filter at no performance cost.
arXiv Detail & Related papers (2022-11-04T12:54:06Z) - Insights into Deep Non-linear Filters for Improved Multi-channel Speech
Enhancement [21.422488450492434]
In a traditional setting, linear spatial filtering (beamforming) and single-channel post-filtering are commonly performed separately.
There is a trend towards employing deep neural networks (DNNs) to learn a joint spatial and tempo-spectral non-linear filter.
arXiv Detail & Related papers (2022-06-27T13:54:14Z) - Filter-enhanced MLP is All You Need for Sequential Recommendation [89.0974365344997]
In online platforms, logged user behavior data is inevitable to contain noise.
We borrow the idea of filtering algorithms from signal processing that attenuates the noise in the frequency domain.
We propose textbfFMLP-Rec, an all-MLP model with learnable filters for sequential recommendation task.
arXiv Detail & Related papers (2022-02-28T05:49:35Z) - Adaptive Low-Pass Filtering using Sliding Window Gaussian Processes [71.23286211775084]
We propose an adaptive low-pass filter based on Gaussian process regression.
We show that the estimation error of the proposed method is uniformly bounded.
arXiv Detail & Related papers (2021-11-05T17:06:59Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z) - Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches.
When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy.
We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.