Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis
- URL: http://arxiv.org/abs/2103.14201v1
- Date: Fri, 26 Mar 2021 01:25:58 GMT
- Title: Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis
- Authors: Nikhil Singh and Jeff Mentch and Jerry Ng and Matthew Beveridge and
Iddo Drori
- Abstract summary: We use an end-to-end neural network architecture to generate plausible audio impulse responses from single images of acoustic environments.
We demonstrate our approach by generating plausible impulse responses from diverse settings and formats.
- Score: 0.3587367153279349
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Measuring the acoustic characteristics of a space is often done by capturing
its impulse response (IR), a representation of how a full-range stimulus sound
excites it. This is the first work that generates an IR from a single image,
which we call Image2Reverb. This IR is then applied to other signals using
convolution, simulating the reverberant characteristics of the space shown in
the image. Recording these IRs is both time-intensive and expensive, and often
infeasible for inaccessible locations. We use an end-to-end neural network
architecture to generate plausible audio impulse responses from single images
of acoustic environments. We evaluate our method both by comparisons to ground
truth data and by human expert evaluation. We demonstrate our approach by
generating plausible impulse responses from diverse settings and formats
including well known places, musical halls, rooms in paintings, images from
animations and computer games, synthetic environments generated from text,
panoramic images, and video conference backgrounds.
Related papers
- Hearing Anything Anywhere [26.415266601469767]
We introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene.
This allows us to synthesize novel auditory experiences through the space with any source audio.
We show that our model outperforms state-ofthe-art baselines on rendering monaural and RIRs and music at unseen locations.
arXiv Detail & Related papers (2024-06-11T17:56:14Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - AV-RIR: Audio-Visual Room Impulse Response Estimation [49.469389715876915]
Accurate estimation of Room Impulse Response (RIR) is important for speech processing and AR/VR applications.
We propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and visual cues of its corresponding environment.
arXiv Detail & Related papers (2023-11-30T22:58:30Z) - An Integrated Algorithm for Robust and Imperceptible Audio Adversarial
Examples [2.2866551516539726]
A viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness.
We present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step.
arXiv Detail & Related papers (2023-10-05T06:59:09Z) - Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields [61.07542274267568]
This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene.
Driven by the unique properties of RIR, we design a temporal correlation module and multi-scale energy decay criterion.
Experimental results show that NACF outperforms existing field-based methods by a notable margin.
arXiv Detail & Related papers (2023-09-27T19:50:50Z) - Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation [69.1351513309953]
We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation.
We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods.
arXiv Detail & Related papers (2022-12-10T20:15:23Z) - One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In
Any Room/ Concert Hall [3.652509571098291]
We propose a novel architecture that can transform a sound of interest into any other acoustic space of interest.
Our framework allows a neural network to adjust gains of every point in the time-frequency representation.
arXiv Detail & Related papers (2022-10-27T19:54:05Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.