Enhancing Audio Perception of Music By AI Picked Room Acoustics
- URL: http://arxiv.org/abs/2208.07994v1
- Date: Tue, 16 Aug 2022 23:47:43 GMT
- Title: Enhancing Audio Perception of Music By AI Picked Room Acoustics
- Authors: Prateek Verma and Jonathan Berger
- Abstract summary: We seek to determine the best room in which to perform a particular piece using AI.
We use room acoustics as a way to enhance the perceptual qualities of a given sound.
- Score: 4.314956204483073
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Every sound that we hear is the result of successive convolutional operations
(e.g. room acoustics, microphone characteristics, resonant properties of the
instrument itself, not to mention characteristics and limitations of the sound
reproduction system). In this work we seek to determine the best room in which
to perform a particular piece using AI. Additionally, we use room acoustics as
a way to enhance the perceptual qualities of a given sound. Historically, rooms
(particularly Churches and concert halls) were designed to host and serve
specific musical functions. In some cases the architectural acoustical
qualities enhanced the music performed there. We try to mimic this, as a first
step, by designating room impulse responses that would correlate to producing
enhanced sound quality for particular music. A convolutional architecture is
first trained to take in an audio sample and mimic the ratings of experts with
about 78 % accuracy for various instrument families and notes for perceptual
qualities. This gives us a scoring function for any audio sample which can rate
the perceptual pleasantness of a note automatically. Now, via a library of
about 60,000 synthetic impulse responses mimicking all kinds of room,
materials, etc, we use a simple convolution operation, to transform the sound
as if it was played in a particular room. The perceptual evaluator is used to
rank the musical sounds, and yield the "best room or the concert hall" to play
a sound. As a byproduct it can also use room acoustics to turn a poor quality
sound into a "good" sound.
Related papers
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - SoundCam: A Dataset for Finding Humans Using Room Acoustics [22.279282163908462]
We present SoundCam, the largest dataset of unique RIRs from in-the-wild rooms publicly released to date.
It includes 5,000 10-channel real-world measurements of room impulse responses and 2,000 10-channel recordings of music in three different rooms.
We show that these measurements can be used for interesting tasks, such as detecting and identifying humans, and tracking their positions.
arXiv Detail & Related papers (2023-11-06T20:51:16Z) - Exploiting Time-Frequency Conformers for Music Audio Enhancement [21.243039524049614]
We propose a music enhancement system based on the Conformer architecture.
Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task.
arXiv Detail & Related papers (2023-08-24T06:56:54Z) - AdVerb: Visually Guided Audio Dereverberation [49.958724234969445]
We present AdVerb, a novel audio-visual dereverberation framework.
It uses visual cues in addition to the reverberant sound to estimate clean audio.
arXiv Detail & Related papers (2023-08-23T18:20:59Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In
Any Room/ Concert Hall [3.652509571098291]
We propose a novel architecture that can transform a sound of interest into any other acoustic space of interest.
Our framework allows a neural network to adjust gains of every point in the time-frequency representation.
arXiv Detail & Related papers (2022-10-27T19:54:05Z) - Visual Acoustic Matching [92.91522122739845]
We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.
Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials.
arXiv Detail & Related papers (2022-02-14T17:05:22Z) - Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition.
Our idea is to learn to dereverberate speech from audio-visual observations.
We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z) - Joint Blind Room Acoustic Characterization From Speech And Music Signals
Using Convolutional Recurrent Neural Networks [13.12834490248018]
Reverberation time, clarity, and direct-to-reverberant ratio are acoustic parameters that have been defined to describe reverberant environments.
Recent audio combined with machine learning suggests that one could estimate those parameters blindly using speech or music signals.
We propose a robust end-to-end method to achieve blind joint acoustic parameter estimation using speech and/or music signals.
arXiv Detail & Related papers (2020-10-21T17:41:21Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.