Related papers: Beamformed 360° Sound Maps: U-Net-Driven Acoustic Source Segmentation and Localization

Beamformed 360° Sound Maps: U-Net-Driven Acoustic Source Segmentation and Localization

URL: http://arxiv.org/abs/2508.00307v1
Date: Fri, 01 Aug 2025 04:23:18 GMT
Title: Beamformed 360° Sound Maps: U-Net-Driven Acoustic Source Segmentation and Localization
Authors: Belman Jahir Rodriguez, Sergio F. Chevtchenko, Marcelo Herrera Martinez, Yeshwant Bethy, Saeed Afshar,
Abstract summary: We introduce a U-net model for 360deg acoustic source localization formulated as a spherical semantic segmentation task.<n>Our dataset includes real-world open-field recordings of a DJI Air 3 drone, synchronized with 360deg video and flight logs across multiple dates and locations.
Score: 0.10485739694839666
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce a U-net model for 360{\deg} acoustic source localization formulated as a spherical semantic segmentation task. Rather than regressing discrete direction-of-arrival (DoA) angles, our model segments beamformed audio maps (azimuth and elevation) into regions of active sound presence. Using delay-and-sum (DAS) beamforming on a custom 24-microphone array, we generate signals aligned with drone GPS telemetry to create binary supervision masks. A modified U-Net, trained on frequency-domain representations of these maps, learns to identify spatially distributed source regions while addressing class imbalance via the Tversky loss. Because the network operates on beamformed energy maps, the approach is inherently array-independent and can adapt to different microphone configurations without retraining from scratch. The segmentation outputs are post-processed by computing centroids over activated regions, enabling robust DoA estimates. Our dataset includes real-world open-field recordings of a DJI Air 3 drone, synchronized with 360{\deg} video and flight logs across multiple dates and locations. Experimental results show that U-net generalizes across environments, providing improved angular precision, offering a new paradigm for dense spatial audio understanding beyond traditional Sound Source Localization (SSL).

Related papers

Localizing Audio-Visual Deepfakes via Hierarchical Boundary Modeling [50.8215545241128]
We propose a.<n> Boundary Modeling Network (HBMNet), which includes three modules: an Audio-Visual Feature, a.<n> Coarse Proposal Generator and a Fine-Hierarchical Probabilities Generator.<n>From the modality perspective, we enhance audio-visual encoding and fusion, reinforced by frame-level supervision.<n>Experiments show that encoding and fusion primarily improve precision, while frame-level supervision recall.
arXiv Detail & Related papers (2025-08-04T02:41:09Z)
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling [60.267226205350596]
Radio map estimation aims to generate a dense representation of electromagnetic spectrum quantities.<n>We propose RadioFormer, a novel multiple-granularity transformer to handle the constraints posed by spatial sparse observations.<n>We show that RadioFormer outperforms state-of-the-art methods in radio map estimation while maintaining the lowest computational cost.
arXiv Detail & Related papers (2025-04-27T08:44:41Z)
Constructing Indoor Region-based Radio Map without Location Labels [18.34037687586167]
This paper develops a region-based radio map from received signal strength ( RSS) measurements without location labels. The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once. The proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline.
arXiv Detail & Related papers (2023-08-31T14:27:36Z)
Multi-Microphone Speaker Separation by Spatial Regions [9.156939957189504]
We consider the task of region-based source separation of reverberant multi-microphone recordings. We propose a data-driven approach using a modified version of a state-of-the-art network. We show that both training methods result in a fixed mapping of regions to network outputs, achieve comparable performance, and that the networks exploit spatial information.
arXiv Detail & Related papers (2023-03-13T14:11:34Z)
Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain [131.74762114632404]
The model is trained end-to-end and performs spatial processing implicitly. We evaluate the proposed model on a real-world dataset and show that the model matches the performance of an oracle beamformer.
arXiv Detail & Related papers (2022-06-30T17:13:01Z)
Three-Way Deep Neural Network for Radio Frequency Map Generation and Source Localization [67.93423427193055]
Monitoring wireless spectrum over spatial, temporal, and frequency domains will become a critical feature in beyond-5G and 6G communication technologies. In this paper, we present a Generative Adversarial Network (GAN) machine learning model to interpolate irregularly distributed measurements across the spatial domain.
arXiv Detail & Related papers (2021-11-23T22:25:10Z)
PILOT: Introducing Transformers for Probabilistic Sound Event Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z)
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain [103.3388198420822]
Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization. This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions. A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
arXiv Detail & Related papers (2021-02-23T09:59:31Z)
Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.