One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In
Any Room/ Concert Hall
- URL: http://arxiv.org/abs/2210.15750v1
- Date: Thu, 27 Oct 2022 19:54:05 GMT
- Title: One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In
Any Room/ Concert Hall
- Authors: Prateek Verma, Chris Chafe, Jonathan Berger
- Abstract summary: We propose a novel architecture that can transform a sound of interest into any other acoustic space of interest.
Our framework allows a neural network to adjust gains of every point in the time-frequency representation.
- Score: 3.652509571098291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The acoustic space in which a sound is created and heard plays an essential
role in how that sound is perceived by affording a unique sense of
\textit{presence}. Every sound we hear results from successive convolution
operations intrinsic to the sound source and external factors such as
microphone characteristics and room impulse responses. Typically, researchers
use an excitation such as a pistol shot or balloon pop as an impulse signal
with which an auralization can be created. The room "impulse" responses
convolved with the signal of interest can transform the input sound into the
sound played in the acoustic space of interest. Here we propose a novel
architecture that can transform a sound of interest into any other acoustic
space(room or hall) of interest by using arbitrary audio recorded as a proxy
for a balloon pop. The architecture is grounded in simple signal processing
ideas to learn residual signals from a learned acoustic signature and the input
signal. Our framework allows a neural network to adjust gains of every point in
the time-frequency representation, giving sound qualitative and quantitative
results.
Related papers
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields [61.07542274267568]
This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene.
Driven by the unique properties of RIR, we design a temporal correlation module and multi-scale energy decay criterion.
Experimental results show that NACF outperforms existing field-based methods by a notable margin.
arXiv Detail & Related papers (2023-09-27T19:50:50Z) - Sound Design Strategies for Latent Audio Space Explorations Using Deep
Learning Architectures [1.6114012813668934]
We explore a well-known Deep Learning architecture called Variational Autoencoders (VAEs)
VAEs have been used for generating latent timbre spaces or latent spaces of symbolic music excepts.
In this work, we apply VAEs to raw audio data directly while bypassing audio feature extraction.
arXiv Detail & Related papers (2023-05-24T21:08:42Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - Enhancing Audio Perception of Music By AI Picked Room Acoustics [4.314956204483073]
We seek to determine the best room in which to perform a particular piece using AI.
We use room acoustics as a way to enhance the perceptual qualities of a given sound.
arXiv Detail & Related papers (2022-08-16T23:47:43Z) - Learning Neural Acoustic Fields [110.22937202449025]
We introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene.
By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs.
We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations.
arXiv Detail & Related papers (2022-04-04T17:59:37Z) - Visual Acoustic Matching [92.91522122739845]
We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.
Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials.
arXiv Detail & Related papers (2022-02-14T17:05:22Z) - Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis [0.3587367153279349]
We use an end-to-end neural network architecture to generate plausible audio impulse responses from single images of acoustic environments.
We demonstrate our approach by generating plausible impulse responses from diverse settings and formats.
arXiv Detail & Related papers (2021-03-26T01:25:58Z) - Joint Blind Room Acoustic Characterization From Speech And Music Signals
Using Convolutional Recurrent Neural Networks [13.12834490248018]
Reverberation time, clarity, and direct-to-reverberant ratio are acoustic parameters that have been defined to describe reverberant environments.
Recent audio combined with machine learning suggests that one could estimate those parameters blindly using speech or music signals.
We propose a robust end-to-end method to achieve blind joint acoustic parameter estimation using speech and/or music signals.
arXiv Detail & Related papers (2020-10-21T17:41:21Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z) - Unsupervised Learning of Audio Perception for Robotics Applications:
Learning to Project Data to T-SNE/UMAP space [2.8935588665357077]
This paper builds upon key ideas to build perception of touch sounds without access to any ground-truth data.
We show how we can leverage ideas from classical signal processing to get large amounts of data of any sound of interest with a high precision.
arXiv Detail & Related papers (2020-02-10T20:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.