Multi-Microphone Speaker Separation by Spatial Regions
        - URL: http://arxiv.org/abs/2303.07143v1
- Date: Mon, 13 Mar 2023 14:11:34 GMT
- Title: Multi-Microphone Speaker Separation by Spatial Regions
- Authors: Julian Wechsler, Srikanth Raj Chetupalli, Wolfgang Mack, Emanu\"el A.
  P. Habets
- Abstract summary: We consider the task of region-based source separation of reverberant multi-microphone recordings.
We propose a data-driven approach using a modified version of a state-of-the-art network.
We show that both training methods result in a fixed mapping of regions to network outputs, achieve comparable performance, and that the networks exploit spatial information.
- Score: 9.156939957189504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We consider the task of region-based source separation of reverberant
multi-microphone recordings. We assume pre-defined spatial regions with a
single active source per region. The objective is to estimate the signals from
the individual spatial regions as captured by a reference microphone while
retaining a correspondence between signals and spatial regions. We propose a
data-driven approach using a modified version of a state-of-the-art network,
where different layers model spatial and spectro-temporal information. The
network is trained to enforce a fixed mapping of regions to network outputs.
Using speech from LibriMix, we construct a data set specifically designed to
contain the region information. Additionally, we train the network with
permutation invariant training. We show that both training methods result in a
fixed mapping of regions to network outputs, achieve comparable performance,
and that the networks exploit spatial information. The proposed network
outperforms a baseline network by 1.5 dB in scale-invariant
signal-to-distortion ratio.
 
      
        Related papers
        - Beamformed 360° Sound Maps: U-Net-Driven Acoustic Source   Segmentation and Localization [0.10485739694839666]
 We introduce a U-net model for 360deg acoustic source localization formulated as a spherical semantic segmentation task.<n>Our dataset includes real-world open-field recordings of a DJI Air 3 drone, synchronized with 360deg video and flight logs across multiple dates and locations.
 arXiv  Detail & Related papers  (2025-08-01T04:23:18Z)
- RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer   with 1\textpertenthousand Spatial Sampling [60.267226205350596]
 Radio map estimation aims to generate a dense representation of electromagnetic spectrum quantities.
We propose RadioFormer, a novel multiple-granularity transformer to handle the constraints posed by spatial sparse observations.
We show that RadioFormer outperforms state-of-the-art methods in radio map estimation while maintaining the lowest computational cost.
 arXiv  Detail & Related papers  (2025-04-27T08:44:41Z)
- Feature Aggregation in Joint Sound Classification and Localization
  Neural Networks [0.0]
 Current state-of-the-art sound source localization deep learning networks lack feature aggregation within their architecture.
We adapt feature aggregation techniques from computer vision neural networks to signal detection neural networks.
 arXiv  Detail & Related papers  (2023-10-29T16:37:14Z)
- Constructing Indoor Region-based Radio Map without Location Labels [18.34037687586167]
 This paper develops a region-based radio map from received signal strength ( RSS) measurements without location labels.
The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once.
The proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline.
 arXiv  Detail & Related papers  (2023-08-31T14:27:36Z)
- Multi-channel Speech Separation Using Spatially Selective Deep
  Non-linear Filters [21.672683390080106]
 In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture.
We propose a deep neural network based spatially selective filter (SSF) that can be spatially steered to extract the speaker of interest.
 arXiv  Detail & Related papers  (2023-04-24T11:44:00Z)
- SLAN: Self-Locator Aided Network for Cross-Modal Understanding [89.20623874655352]
 We propose Self-Locator Aided Network (SLAN) for cross-modal understanding tasks.
SLAN consists of a region filter and a region adaptor to localize regions of interest conditioned on different texts.
It achieves fairly competitive results on five cross-modal understanding tasks.
 arXiv  Detail & Related papers  (2022-11-28T11:42:23Z)
- Implicit Neural Spatial Filtering for Multichannel Source Separation in
  the Waveform Domain [131.74762114632404]
 The model is trained end-to-end and performs spatial processing implicitly.
We evaluate the proposed model on a real-world dataset and show that the model matches the performance of an oracle beamformer.
 arXiv  Detail & Related papers  (2022-06-30T17:13:01Z)
- Three-Way Deep Neural Network for Radio Frequency Map Generation and
  Source Localization [67.93423427193055]
 Monitoring wireless spectrum over spatial, temporal, and frequency domains will become a critical feature in beyond-5G and 6G communication technologies.
In this paper, we present a Generative Adversarial Network (GAN) machine learning model to interpolate irregularly distributed measurements across the spatial domain.
 arXiv  Detail & Related papers  (2021-11-23T22:25:10Z)
- Learning Signal-Agnostic Manifolds of Neural Fields [50.066449953522685]
 We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains.
We show that by walking across the underlying manifold of GEM, we may generate new samples in our signal domains.
 arXiv  Detail & Related papers  (2021-11-11T18:57:40Z)
- Data Fusion for Audiovisual Speaker Localization: Extending Dynamic
  Stream Weights to the Spatial Domain [103.3388198420822]
 Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions.
A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
 arXiv  Detail & Related papers  (2021-02-23T09:59:31Z)
- Spatial Attention Pyramid Network for Unsupervised Domain Adaptation [66.75008386980869]
 Unsupervised domain adaptation is critical in various computer vision tasks.
We design a new spatial attention pyramid network for unsupervised domain adaptation.
Our method performs favorably against the state-of-the-art methods by a large margin.
 arXiv  Detail & Related papers  (2020-03-29T09:03:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.