Sound Design Strategies for Latent Audio Space Explorations Using Deep
Learning Architectures
- URL: http://arxiv.org/abs/2305.15571v2
- Date: Mon, 19 Jun 2023 09:59:35 GMT
- Title: Sound Design Strategies for Latent Audio Space Explorations Using Deep
Learning Architectures
- Authors: K{\i}van\c{c} Tatar, Kelsey Cotton, Daniel Bisig
- Abstract summary: We explore a well-known Deep Learning architecture called Variational Autoencoders (VAEs)
VAEs have been used for generating latent timbre spaces or latent spaces of symbolic music excepts.
In this work, we apply VAEs to raw audio data directly while bypassing audio feature extraction.
- Score: 1.6114012813668934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The research in Deep Learning applications in sound and music computing have
gathered an interest in the recent years; however, there is still a missing
link between these new technologies and on how they can be incorporated into
real-world artistic practices. In this work, we explore a well-known Deep
Learning architecture called Variational Autoencoders (VAEs). These
architectures have been used in many areas for generating latent spaces where
data points are organized so that similar data points locate closer to each
other. Previously, VAEs have been used for generating latent timbre spaces or
latent spaces of symbolic music excepts. Applying VAE to audio features of
timbre requires a vocoder to transform the timbre generated by the network to
an audio signal, which is computationally expensive. In this work, we apply
VAEs to raw audio data directly while bypassing audio feature extraction. This
approach allows the practitioners to use any audio recording while giving
flexibility and control over the aesthetics through dataset curation. The lower
computation time in audio signal generation allows the raw audio approach to be
incorporated into real-time applications. In this work, we propose three
strategies to explore latent spaces of audio and timbre for sound design
applications. By doing so, our aim is to initiate a conversation on artistic
approaches and strategies to utilize latent audio spaces in sound and music
practices.
Related papers
- SoundSignature: What Type of Music Do You Like? [0.0]
SoundSignature is a music application that integrates a custom OpenAI Assistant to analyze users' favorite songs.
The system incorporates state-of-the-art Music Information Retrieval (MIR) Python packages to combine extracted acoustic/musical features with the assistant's extensive knowledge of the artists and bands.
arXiv Detail & Related papers (2024-10-04T12:40:45Z) - SOAF: Scene Occlusion-aware Neural Acoustic Field [9.651041527067907]
We propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation.
Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling.
We extract features from local acoustic field centred around the receiver using a Fibonacci Sphere to generate audio for novel views.
arXiv Detail & Related papers (2024-07-02T13:40:56Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders [53.30016986953206]
We propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture.
We train and evaluate our framework on thousands of speakers and 11+ different languages, and study our model's ability to adapt to different levels of background noise and speech interference.
arXiv Detail & Related papers (2022-11-20T15:27:55Z) - AudioLM: a Language Modeling Approach to Audio Generation [59.19364975706805]
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.
We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure.
We demonstrate how our approach extends beyond speech by generating coherent piano music continuations.
arXiv Detail & Related papers (2022-09-07T13:40:08Z) - Visual Acoustic Matching [92.91522122739845]
We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment.
Given an image of the target environment and a waveform for the source audio, the goal is to re-synthesize the audio to match the target room acoustics as suggested by its visible geometry and materials.
arXiv Detail & Related papers (2022-02-14T17:05:22Z) - Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio.
Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z) - Artificially Synthesising Data for Audio Classification and Segmentation
to Improve Speech and Music Detection in Radio Broadcast [0.0]
We present a novel procedure that artificially synthesises data that resembles radio signals.
We trained a Convolutional Recurrent Neural Network (CRNN) on this synthesised data and outperformed state-of-the-art algorithms for music-speech detection.
arXiv Detail & Related papers (2021-02-19T14:47:05Z) - Unsupervised Learning of Audio Perception for Robotics Applications:
Learning to Project Data to T-SNE/UMAP space [2.8935588665357077]
This paper builds upon key ideas to build perception of touch sounds without access to any ground-truth data.
We show how we can leverage ideas from classical signal processing to get large amounts of data of any sound of interest with a high precision.
arXiv Detail & Related papers (2020-02-10T20:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.