EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling
- URL: http://arxiv.org/abs/2504.02402v1
- Date: Thu, 03 Apr 2025 08:51:17 GMT
- Title: EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling
- Authors: Hao Yin, Shi Guo, Xu Jia, Xudong XU, Lu Zhang, Si Liu, Dong Wang, Huchuan Lu, Tianfan Xue,
- Abstract summary: When sound waves hit an object, they induce vibrations that produce high-frequency and subtle visual changes.<n>Recent advances in event camera hardware show good potential for its application in visual sound recovery.<n>We propose a novel pipeline for non-contact sound recovery, fully utilizing spatial-temporal information from the event stream.
- Score: 69.96729022219117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When sound waves hit an object, they induce vibrations that produce high-frequency and subtle visual changes, which can be used for recovering the sound. Early studies always encounter trade-offs related to sampling rate, bandwidth, field of view, and the simplicity of the optical path. Recent advances in event camera hardware show good potential for its application in visual sound recovery, because of its superior ability in capturing high-frequency signals. However, existing event-based vibration recovery methods are still sub-optimal for sound recovery. In this work, we propose a novel pipeline for non-contact sound recovery, fully utilizing spatial-temporal information from the event stream. We first generate a large training set using a novel simulation pipeline. Then we designed a network that leverages the sparsity of events to capture spatial information and uses Mamba to model long-term temporal information. Lastly, we train a spatial aggregation block to aggregate information from different locations to further improve signal quality. To capture event signals caused by sound waves, we also designed an imaging system using a laser matrix to enhance the gradient and collected multiple data sequences for testing. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.
Related papers
- UltraRay: Full-Path Ray Tracing for Enhancing Realism in Ultrasound Simulation [43.433512581459176]
We propose a novel ultrasound simulation pipeline that utilizes a ray tracing algorithm to generate echo data.<n>To replicate advanced ultrasound imaging, we introduce a ray emission scheme optimized for plane wave imaging, incorporating delay and steering capabilities.<n>In doing so, our proposed approach, UltraRay, not only enhances the overall visual quality but also improves the realism of the simulated images.
arXiv Detail & Related papers (2025-01-10T10:07:41Z) - Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction [51.71299452862839]
We propose the first treatment of sim2real for audio-visual navigation by disentangling it into acoustic field prediction (AFP) and waypoint navigation.
We then collect real-world data to measure the spectral difference between the simulation and the real world by training AFP models that only take a specific frequency subband as input.
Lastly, we build a real robot platform and show that the transferred policy can successfully navigate to sounding objects.
arXiv Detail & Related papers (2024-05-05T06:01:31Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound
Event Localization and Detection in Realistic Rooms [4.266697413924045]
Sound event localization and detection (SELD) is an important task in machine listening.
We present SpatialScaper, a library for SELD data simulation and augmentation.
arXiv Detail & Related papers (2024-01-19T19:01:13Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Deep learning-based deconvolution for interferometric radio transient
reconstruction [0.39259415717754914]
Radio astronomy facilities like LOFAR, MeerKAT/SKA, ASKAP/SKA, and the future SKA-LOW bring tremendous sensitivity in time and frequency.
These facilities enable advanced studies of radio transients, volatile by nature, that can be detected or missed in the data.
These transients are markers of high-energy accelerations of electrons and manifest in a wide range of temporal scales.
arXiv Detail & Related papers (2023-06-24T08:58:52Z) - Structural Vibration Signal Denoising Using Stacking Ensemble of Hybrid
CNN-RNN [0.0]
In recent years, there has been a growing trend towards the use of vibration signals in the field of bioengineering.
Footstep-induced vibrations are useful for analyzing the movement of biological systems such as the human body and animals.
In this paper, we propose a novel ensemble model that leverages both the ensemble of multiple signals and of recurrent and convolutional neural network predictions.
arXiv Detail & Related papers (2023-03-11T00:49:45Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z) - SoundDet: Polyphonic Sound Event Detection and Localization from Raw
Waveform [48.68714598985078]
SoundDet is an end-to-end trainable and light-weight framework for polyphonic moving sound event detection and localization.
SoundDet directly consumes the raw, multichannel waveform and treats the temporal sound event as a complete sound-object" to be detected.
A dense sound proposal event map is then constructed to handle the challenges of predicting events with large varying temporal duration.
arXiv Detail & Related papers (2021-06-13T11:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.