Data-Efficient Framework for Real-world Multiple Sound Source 2D
Localization
- URL: http://arxiv.org/abs/2012.05533v3
- Date: Wed, 17 Mar 2021 08:50:36 GMT
- Title: Data-Efficient Framework for Real-world Multiple Sound Source 2D
Localization
- Authors: Guillaume Le Moing, Phongtharin Vinayavekhin, Don Joven Agravante,
Tadanobu Inoue, Jayakorn Vongkulbhisal, Asim Munawar, Ryuki Tachibana
- Abstract summary: We propose a novel ensemble-discrimination method to improve the localization performance without requiring any label from the real data.
It enables the model to be trained with data from specific microphone array layouts while generalizing well to unseen layouts during inference.
- Score: 7.564344795030588
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have recently led to promising results for the task of
multiple sound source localization. Yet, they require a lot of training data to
cover a variety of acoustic conditions and microphone array layouts. One can
leverage acoustic simulators to inexpensively generate labeled training data.
However, models trained on synthetic data tend to perform poorly with
real-world recordings due to the domain mismatch. Moreover, learning for
different microphone array layouts makes the task more complicated due to the
infinite number of possible layouts. We propose to use adversarial learning
methods to close the gap between synthetic and real domains. Our novel
ensemble-discrimination method significantly improves the localization
performance without requiring any label from the real data. Furthermore, we
propose a novel explicit transformation layer to be embedded in the
localization architecture. It enables the model to be trained with data from
specific microphone array layouts while generalizing well to unseen layouts
during inference.
Related papers
- Radio Foundation Models: Pre-training Transformers for 5G-based Indoor Localization [3.2805385616712677]
We propose a self-supervised learning framework that pre-trains a general transformer (TF) neural network on 5G channel measurements without expensive equipment.
Our novel pretext task randomly masks and drops input information to learn to reconstruct it.
It implicitly learnstemporal patterns and information of the propagation environment that enable FP-based localization.
arXiv Detail & Related papers (2024-10-01T12:03:32Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Metric-based multimodal meta-learning for human movement identification
via footstep recognition [3.300376360949452]
We describe a novel metric-based learning approach that introduces a multimodal framework.
We learn general-purpose representations from low multisensory data obtained from omnipresent sensing systems.
Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity.
arXiv Detail & Related papers (2021-11-15T18:46:14Z) - Learning Signal-Agnostic Manifolds of Neural Fields [50.066449953522685]
We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains.
We show that by walking across the underlying manifold of GEM, we may generate new samples in our signal domains.
arXiv Detail & Related papers (2021-11-11T18:57:40Z) - PILOT: Introducing Transformers for Probabilistic Sound Event
Localization [107.78964411642401]
This paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms.
The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy.
arXiv Detail & Related papers (2021-06-07T18:29:19Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Ensemble of Discriminators for Domain Adaptation in Multiple Sound
Source 2D Localization [7.564344795030588]
This paper introduces an ensemble of discriminators that improves the accuracy of a domain adaptation technique for the localization of multiple sound sources.
Recording and labeling such datasets is very costly, especially because data needs to be diverse enough to cover different acoustic conditions.
arXiv Detail & Related papers (2020-12-10T09:17:29Z) - Scene-Agnostic Multi-Microphone Speech Dereverberation [47.735158037490834]
We present an NN architecture that can cope with microphone arrays whose number and positions are unknown.
Our approach harnesses recent advances in deep learning on set-structured data to design an architecture that enhances the reverberant log-spectrum.
arXiv Detail & Related papers (2020-10-22T17:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.