Related papers: RealImpact: A Dataset of Impact Sound Fields for Real Objects

RealImpact: A Dataset of Impact Sound Fields for Real Objects

URL: http://arxiv.org/abs/2306.09944v1
Date: Fri, 16 Jun 2023 16:25:41 GMT
Title: RealImpact: A Dataset of Impact Sound Fields for Real Objects
Authors: Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Jui-Hsien Wang, Doug L. James, Jiajun Wu
Abstract summary: We present RealImpact, a large-scale dataset of real object impact sounds recorded under controlled conditions. RealImpact contains 150,000 recordings of impact sounds of 50 everyday objects with detailed annotations. We make preliminary attempts to use our dataset as a reference to current simulation methods for estimating object impact sounds.
Score: 29.066504517249083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Objects make unique sounds under different perturbations, environment conditions, and poses relative to the listener. While prior works have modeled impact sounds and sound propagation in simulation, we lack a standard dataset of impact sound fields of real objects for audio-visual learning and calibration of the sim-to-real gap. We present RealImpact, a large-scale dataset of real object impact sounds recorded under controlled conditions. RealImpact contains 150,000 recordings of impact sounds of 50 everyday objects with detailed annotations, including their impact locations, microphone locations, contact force profiles, material labels, and RGBD images. We make preliminary attempts to use our dataset as a reference to current simulation methods for estimating object impact sounds that match the real world. Moreover, we demonstrate the usefulness of our dataset as a testbed for acoustic and audio-visual learning via the evaluation of two benchmark tasks, including listener location classification and visual acoustic matching.

Related papers

Evaluation of Deep Audio Representations for Hearables [1.5646349560044959]
This dataset includes 1,158 audio tracks, each 30 seconds long, created by spatially mixing proprietary monologues with high-quality recordings of everyday acoustic scenes. Our benchmark encompasses eight tasks that assess the general context, speech sources, and technical acoustic properties of the audio scenes. This superiority underscores the advantage of models trained on diverse audio collections, confirming their applicability to a wide array of auditory tasks, including encoding the environment properties necessary for hearable steering.
arXiv Detail & Related papers (2025-02-10T16:51:11Z)
HARP: A Large-Scale Higher-Order Ambisonic Room Impulse Response Dataset [0.6568378556428859]
This contribution introduces a dataset of 7th-order Ambisonic Room Impulse Responses (HOA-RIRs) created using the Image Source Method. By employing higher-order Ambisonics, our dataset enables precise spatial audio reproduction. The presented 64-microphone configuration allows us to capture RIRs directly in the Spherical Harmonics domain.
arXiv Detail & Related papers (2024-11-21T15:16:48Z)
AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio. We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment. Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities. RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z)
Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio. Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z)
Unsupervised Acoustic Scene Mapping Based on Acoustic Features and Dimensionality Reduction [18.641610823584433]
We introduce an unsupervised data-driven approach that exploits the natural structure of the data. Our method builds upon local conformal autoencoders (LOCA) - an offline deep learning scheme for learning standardized data coordinates from measurements.
arXiv Detail & Related papers (2023-01-01T17:46:09Z)
Representation Learning for the Automatic Indexing of Sound Effects Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size. Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z)
Finding Fallen Objects Via Asynchronous Audio-Visual Integration [89.75296559813437]
This paper introduces a setting in which to study multi-modal object localization in 3D virtual environments. An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics. The dataset uses the ThreeDWorld platform which can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting.
arXiv Detail & Related papers (2022-07-07T17:59:59Z)
A Study on Robustness to Perturbations for Representations of Environmental Sound [16.361059909912758]
We evaluate two embeddings -- YAMNet, and OpenL$3$ on monophonic (UrbanSound8K) and polyphonic (SONYC UST) datasets. We imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new embeddings with three distance measures.
arXiv Detail & Related papers (2022-03-20T01:04:38Z)
Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Our idea is to learn to dereverberate speech from audio-visual observations. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.