LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search
- URL: http://arxiv.org/abs/2404.14063v1
- Date: Mon, 22 Apr 2024 10:20:41 GMT
- Title: LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search
- Authors: Jinyue Guo, Anna-Maria Christodoulou, Balint Laczko, Kyrre Glette,
- Abstract summary: We propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic sounds.
The proposed algorithm can be a creative tool for sound artists and musicians.
- Score: 0.5624791703748108
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.
Related papers
- One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns [33.293193191683145]
We present a single generative model which can learn to generate multiple types of noise as well as blend between them.
We also present an application of our model to improving inverse procedural material design.
arXiv Detail & Related papers (2024-04-25T02:23:11Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - I2D2: Inductive Knowledge Distillation with NeuroLogic and
Self-Imitation [89.38161262164586]
We study generative models of commonsense knowledge, focusing on the task of generating generics.
We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al.
Our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date.
arXiv Detail & Related papers (2022-12-19T04:47:49Z) - Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong
Learning in Task-Oriented Dialogue [80.05509768165135]
generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples.
Most existing generative replay methods use only a single task-specific token to control their models.
We propose a novel method, prompt conditioned VAE for lifelong learning, to enhance generative replay by incorporating tasks' statistics.
arXiv Detail & Related papers (2022-10-14T13:12:14Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping [19.071463356974387]
This work extends existing methods based on self-supervised learning by bootstrapping, proposes various encoder architectures, and explores the effects of using different pre-training datasets.
We present a novel training framework to come up with a hybrid audio representation, which combines handcrafted and data-driven learned audio features.
All the proposed representations were evaluated within the HEAR NeurIPS 2021 challenge for auditory scene classification and timestamp detection tasks.
arXiv Detail & Related papers (2022-06-24T02:26:40Z) - Learning to Generate Levels by Imitating Evolution [7.110423254122942]
We introduce a new type of iterative level generator using machine learning.
We train a model to imitate the evolutionary process and use the model to generate levels.
This trained model is able to modify noisy levels sequentially to create better levels without the need for a fitness function.
arXiv Detail & Related papers (2022-06-11T10:44:57Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - Generative Deep Learning Techniques for Password Generation [0.5249805590164902]
We study a broad collection of deep learning and probabilistic based models in the light of password guessing.
We provide novel generative deep-learning models in terms of variational autoencoders exhibiting state-of-art sampling performance.
We perform a thorough empirical analysis in a unified controlled framework over well-known datasets.
arXiv Detail & Related papers (2020-12-10T14:11:45Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Deep generative models for musical audio synthesis [0.0]
Sound modelling is the process of developing algorithms that generate sound under parametric control.
Recent generative deep learning systems for audio synthesis are able to learn models that can traverse arbitrary spaces of sound.
This paper is a review of developments in deep learning that are changing the practice of sound modelling.
arXiv Detail & Related papers (2020-06-10T04:02:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.