Multi-channel Speech Separation Using Spatially Selective Deep
Non-linear Filters
- URL: http://arxiv.org/abs/2304.12023v2
- Date: Tue, 21 Nov 2023 14:59:37 GMT
- Title: Multi-channel Speech Separation Using Spatially Selective Deep
Non-linear Filters
- Authors: Kristina Tesch and Timo Gerkmann
- Abstract summary: In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture.
We propose a deep neural network based spatially selective filter (SSF) that can be spatially steered to extract the speaker of interest.
- Score: 21.672683390080106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a multi-channel separation task with multiple speakers, we aim to recover
all individual speech signals from the mixture. In contrast to single-channel
approaches, which rely on the different spectro-temporal characteristics of the
speech signals, multi-channel approaches should additionally utilize the
different spatial locations of the sources for a more powerful separation
especially when the number of sources increases. To enhance the spatial
processing in a multi-channel source separation scenario, in this work, we
propose a deep neural network (DNN) based spatially selective filter (SSF) that
can be spatially steered to extract the speaker of interest by initializing a
recurrent neural network layer with the target direction. We compare the
proposed SSF with a common end-to-end direct separation (DS) approach trained
using utterance-wise permutation invariant training (PIT), which only
implicitly learns to perform spatial filtering. We show that the SSF has a
clear advantage over a DS approach with the same underlying network
architecture when there are more than two speakers in the mixture, which can be
attributed to a better use of the spatial information. Furthermore, we find
that the SSF generalizes much better to additional noise sources that were not
seen during training and to scenarios with speakers positioned at a similar
angle.
Related papers
- On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware
Beamforming Network for Speech Separation [55.533789120204055]
We propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal.
Specifically, we design a multi-channel input and multiple outputs architecture to predict the direction-of-arrival based embeddings and beamforming weights for each source.
arXiv Detail & Related papers (2022-12-07T01:52:40Z) - Spatially Selective Deep Non-linear Filters for Speaker Extraction [21.422488450492434]
We develop a deep joint spatial-spectral non-linear filter that can be steered in an arbitrary target direction.
We show that this scheme is more effective than the baseline approach and increases the flexibility of the filter at no performance cost.
arXiv Detail & Related papers (2022-11-04T12:54:06Z) - Implicit Neural Spatial Filtering for Multichannel Source Separation in
the Waveform Domain [131.74762114632404]
The model is trained end-to-end and performs spatial processing implicitly.
We evaluate the proposed model on a real-world dataset and show that the model matches the performance of an oracle beamformer.
arXiv Detail & Related papers (2022-06-30T17:13:01Z) - Insights into Deep Non-linear Filters for Improved Multi-channel Speech
Enhancement [21.422488450492434]
In a traditional setting, linear spatial filtering (beamforming) and single-channel post-filtering are commonly performed separately.
There is a trend towards employing deep neural networks (DNNs) to learn a joint spatial and tempo-spectral non-linear filter.
arXiv Detail & Related papers (2022-06-27T13:54:14Z) - On the Role of Spatial, Spectral, and Temporal Processing for DNN-based
Non-linear Multi-channel Speech Enhancement [18.133635752982105]
Using deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages.
Non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model.
Joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.
arXiv Detail & Related papers (2022-06-22T15:42:44Z) - Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in
High-order Latent Domain [34.23260020137834]
We propose the Stepwise-Refining Speech Separation Network (SRSSN), which follows a coarse-to-fine separation framework.
It first learns a 1-order latent domain to define an encoding space and thereby performs a rough separation in the coarse phase.
It then learns a new latent domain along each basis function of the existing latent domain to obtain a high-order latent domain in the refining phase.
arXiv Detail & Related papers (2021-10-10T13:21:16Z) - Sparse Multi-Family Deep Scattering Network [14.932318540666543]
We propose a novel architecture exploiting the interpretability of the Deep Scattering Network (DSN)
The SMF-DSN enhances the DSN by increasing the diversity of the scattering coefficients and (ii) improves its robustness with respect to non-stationary noise.
arXiv Detail & Related papers (2020-12-14T16:06:14Z) - Deep Learning Based Antenna Selection for Channel Extrapolation in FDD
Massive MIMO [54.54508321463112]
In massive multiple-input multiple-output (MIMO) systems, the large number of antennas would bring a great challenge for the acquisition of the accurate channel state information.
We utilize the neural networks (NNs) to capture the inherent connection between the uplink and downlink channel data sets and extrapolate the downlink channels from a subset of the uplink channel state information.
We study the antenna subset selection problem in order to achieve the best channel extrapolation and decrease the data size of NNs.
arXiv Detail & Related papers (2020-09-03T13:38:52Z) - Spatial and spectral deep attention fusion for multi-channel speech
separation using deep embedding features [60.20150317299749]
Multi-channel deep clustering (MDC) has acquired a good performance for speech separation.
We propose a deep attention fusion method to dynamically control the weights of the spectral and spatial features and combine them deeply.
Experimental results show that the proposed method outperforms MDC baseline and even better than the ideal binary mask (IBM)
arXiv Detail & Related papers (2020-02-05T03:49:39Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.