Unsupervised Acoustic Scene Mapping Based on Acoustic Features and
Dimensionality Reduction
- URL: http://arxiv.org/abs/2301.00448v2
- Date: Tue, 12 Mar 2024 18:48:40 GMT
- Title: Unsupervised Acoustic Scene Mapping Based on Acoustic Features and
Dimensionality Reduction
- Authors: Idan Cohen, Ofir Lindenbaum and Sharon Gannot
- Abstract summary: We introduce an unsupervised data-driven approach that exploits the natural structure of the data.
Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning scheme for learning standardized data coordinates from measurements.
- Score: 18.641610823584433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classical methods for acoustic scene mapping require the estimation of time
difference of arrival (TDOA) between microphones. Unfortunately, TDOA
estimation is very sensitive to reverberation and additive noise. We introduce
an unsupervised data-driven approach that exploits the natural structure of the
data. Our method builds upon local conformal autoencoders (LOCA) - an offline
deep learning scheme for learning standardized data coordinates from
measurements. Our experimental setup includes a microphone array that measures
the transmitted sound source at multiple locations across the acoustic
enclosure. We demonstrate that LOCA learns a representation that is isometric
to the spatial locations of the microphones. The performance of our method is
evaluated using a series of realistic simulations and compared with other
dimensionality-reduction schemes. We further assess the influence of
reverberation on the results of LOCA and show that it demonstrates considerable
robustness.
Related papers
- Maximum Likelihood Estimation of the Direction of Sound In A Reverberant Noisy Environment [0.8702432681310399]
We describe a new method for estimating the direction of sound in a reverberant environment from basic principles of sound propagation.
The method utilizes SNR-adaptive features from time-delay and energy of the directional components after acoustic wave decomposition.
arXiv Detail & Related papers (2024-06-24T19:42:22Z) - ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling [57.1025908604556]
An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment.
We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment.
We introduce ActiveRIR, a reinforcement learning policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions.
arXiv Detail & Related papers (2024-04-24T21:30:01Z) - Sound event localization and classification using WASN in Outdoor Environment [2.234738672139924]
Methods for sound event localization and classification typically rely on a single microphone array.
We propose a deep learning-based method that employs multiple features and attention mechanisms to estimate the location and class of sound source.
arXiv Detail & Related papers (2024-03-29T11:44:14Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Generative adversarial networks with physical sound field priors [6.256923690998173]
This paper presents a deep learning-based approach for learns-temporal reconstruction of sound fields using Generative Adversa Networks (GANs)
The proposed method uses a plane wave basis and the underlying statistical distributions of pressure in rooms to reconstruct sound fields from a limited number of measurements.
The results suggest that this approach provides a promising approach to sound field reconstruction using generative models that allow for a physically informed acoustics prior to problems.
arXiv Detail & Related papers (2023-08-01T10:11:23Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - C-SL: Contrastive Sound Localization with Inertial-Acoustic Sensors [5.101801159418222]
We introduce contrastive sound localization (C-SL) with mobile inertial-acoustic sensor arrays of arbitrary geometry.
C-SL learns mappings from acoustical measurements to an array-centered direction-of-arrival in a self-supervised manner.
We believe the relaxed calibration process offered by C-SL paves the way toward truly personalized augmented hearing applications.
arXiv Detail & Related papers (2020-06-09T06:36:44Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.