Deep generative models for musical audio synthesis
- URL: http://arxiv.org/abs/2006.06426v2
- Date: Wed, 25 Nov 2020 09:01:31 GMT
- Title: Deep generative models for musical audio synthesis
- Authors: M. Huzaifah and L. Wyse
- Abstract summary: Sound modelling is the process of developing algorithms that generate sound under parametric control.
Recent generative deep learning systems for audio synthesis are able to learn models that can traverse arbitrary spaces of sound.
This paper is a review of developments in deep learning that are changing the practice of sound modelling.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sound modelling is the process of developing algorithms that generate sound
under parametric control. There are a few distinct approaches that have been
developed historically including modelling the physics of sound production and
propagation, assembling signal generating and processing elements to capture
acoustic features, and manipulating collections of recorded audio samples.
While each of these approaches has been able to achieve high-quality synthesis
and interaction for specific applications, they are all labour-intensive and
each comes with its own challenges for designing arbitrary control strategies.
Recent generative deep learning systems for audio synthesis are able to learn
models that can traverse arbitrary spaces of sound defined by the data they
train on. Furthermore, machine learning systems are providing new techniques
for designing control and navigation strategies for these models. This paper is
a review of developments in deep learning that are changing the practice of
sound modelling.
Related papers
- LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search [0.5624791703748108]
We propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic sounds.
The proposed algorithm can be a creative tool for sound artists and musicians.
arXiv Detail & Related papers (2024-04-22T10:20:41Z) - Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions.
Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - A General Framework for Learning Procedural Audio Models of
Environmental Sounds [7.478290484139404]
This paper introduces the Procedural (audio) Variational autoEncoder (ProVE) framework as a general approach to learning Procedural Audio PA models.
We show that ProVE models outperform both classical PA models and an adversarial-based approach in terms of sound fidelity.
arXiv Detail & Related papers (2023-03-04T12:12:26Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - Rigid-Body Sound Synthesis with Differentiable Modal Resonators [6.680437329908454]
We present a novel end-to-end framework for training a deep neural network to generate modal resonators for a given 2D shape and material.
We demonstrate our method on a dataset of synthetic objects, but train our model using an audio-domain objective.
arXiv Detail & Related papers (2022-10-27T10:34:38Z) - Ultrasound Signal Processing: From Models to Deep Learning [64.56774869055826]
Medical ultrasound imaging relies heavily on high-quality signal processing to provide reliable and interpretable image reconstructions.
Deep learning based methods, which are optimized in a data-driven fashion, have gained popularity.
A relatively new paradigm combines the power of the two: leveraging data-driven deep learning, as well as exploiting domain knowledge.
arXiv Detail & Related papers (2022-04-09T13:04:36Z) - Audio representations for deep learning in sound synthesis: A review [0.0]
This paper provides an overview of audio representations applied to sound synthesis using deep learning.
It also presents the most significant methods for developing and evaluating a sound synthesis architecture using deep learning models.
arXiv Detail & Related papers (2022-01-07T15:08:47Z) - MTCRNN: A multi-scale RNN for directed audio texture synthesis [0.0]
We introduce a novel modelling approach for textures, combining recurrent neural networks trained at different levels of abstraction with a conditioning strategy that allows for user-directed synthesis.
We demonstrate the model's performance on a variety of datasets, examine its performance on various metrics, and discuss some potential applications.
arXiv Detail & Related papers (2020-11-25T09:13:53Z) - Automated and Formal Synthesis of Neural Barrier Certificates for
Dynamical Models [70.70479436076238]
We introduce an automated, formal, counterexample-based approach to synthesise Barrier Certificates (BC)
The approach is underpinned by an inductive framework, which manipulates a candidate BC structured as a neural network, and a sound verifier, which either certifies the candidate's validity or generates counter-examples.
The outcomes show that we can synthesise sound BCs up to two orders of magnitude faster, with in particular a stark speedup on the verification engine.
arXiv Detail & Related papers (2020-07-07T07:39:42Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.