C-SL: Contrastive Sound Localization with Inertial-Acoustic Sensors
- URL: http://arxiv.org/abs/2006.05071v1
- Date: Tue, 9 Jun 2020 06:36:44 GMT
- Title: C-SL: Contrastive Sound Localization with Inertial-Acoustic Sensors
- Authors: Majid Mirbagheri, Bardia Doosti
- Abstract summary: We introduce contrastive sound localization (C-SL) with mobile inertial-acoustic sensor arrays of arbitrary geometry.
C-SL learns mappings from acoustical measurements to an array-centered direction-of-arrival in a self-supervised manner.
We believe the relaxed calibration process offered by C-SL paves the way toward truly personalized augmented hearing applications.
- Score: 5.101801159418222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human brain employs perceptual information about the head and eye movements
to update the spatial relationship between the individual and the surrounding
environment. Based on this cognitive process known as spatial updating, we
introduce contrastive sound localization (C-SL) with mobile inertial-acoustic
sensor arrays of arbitrary geometry. C-SL uses unlabeled multi-channel audio
recordings and inertial measurement unit (IMU) readings collected during free
rotational movements of the array to learn mappings from acoustical
measurements to an array-centered direction-of-arrival (DOA) in a
self-supervised manner. Contrary to conventional DOA estimation methods that
require the knowledge of either the array geometry or source locations in the
calibration stage, C-SL is agnostic to both, and can be trained on data
collected in minimally constrained settings. To achieve this capability, our
proposed method utilizes a customized contrastive loss measuring the spatial
contrast between source locations predicted for disjoint segments of the input
to jointly update estimated DOAs and the acoustic-spatial mapping in linear
time. We provide quantitative and qualitative evaluations of C-SL comparing its
performance with baseline DOA estimation methods in a wide range of conditions.
We believe the relaxed calibration process offered by C-SL paves the way toward
truly personalized augmented hearing applications.
Related papers
- A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation [19.384404014248762]
Binaural speech enhancement aims to improve the speech quality and intelligibility of noisy signals received by hearing devices.
Existing methods often suffer from the compromise between noise reduction (NR) capacity and spatial cues ( SCP) accuracy and preservation.
We present a learning-based lightweight complex convolutional network (LBCCN) which excels in NR by filtering low-frequency bands and keeping the rest.
arXiv Detail & Related papers (2024-09-19T03:52:50Z) - Dilated Convolution with Learnable Spacings [1.8130068086063336]
This thesis presents and evaluates the Dilated Convolution with Learnable Spacings (DCLS) method.
Through various supervised learning experiments in the fields of computer vision, audio, and speech processing, the DCLS method proves to outperform both standard and advanced convolution techniques.
arXiv Detail & Related papers (2024-08-10T12:12:39Z) - Spatial-Temporal Decoupling Contrastive Learning for Skeleton-based
Human Action Recognition [10.403751563214113]
STD-CL is a framework to obtain discriminative and semantically distinct representations from the sequences.
STD-CL achieves solid improvements on NTU60, NTU120, and NW-UCLA benchmarks.
arXiv Detail & Related papers (2023-12-23T02:54:41Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Unsupervised Acoustic Scene Mapping Based on Acoustic Features and
Dimensionality Reduction [18.641610823584433]
We introduce an unsupervised data-driven approach that exploits the natural structure of the data.
Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning scheme for learning standardized data coordinates from measurements.
arXiv Detail & Related papers (2023-01-01T17:46:09Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Learning Where to Learn in Cross-View Self-Supervised Learning [54.14989750044489]
Self-supervised learning (SSL) has made enormous progress and largely narrowed the gap with supervised ones.
Current methods simply adopt uniform aggregation of pixels for embedding.
We present a new approach, Learning Where to Learn (LEWEL), to adaptively aggregate spatial information of features.
arXiv Detail & Related papers (2022-03-28T17:02:42Z) - Diarisation using location tracking with agglomerative clustering [42.13772744221499]
This paper explicitly models the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framework.
Experiments show that the proposed approach is able to yield improvements on a Microsoft rich meeting transcription task.
arXiv Detail & Related papers (2021-09-22T08:54:10Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.