Evaluating Deep Music Generation Methods Using Data Augmentation
- URL: http://arxiv.org/abs/2201.00052v1
- Date: Fri, 31 Dec 2021 20:35:46 GMT
- Title: Evaluating Deep Music Generation Methods Using Data Augmentation
- Authors: Toby Godwin and Georgios Rizos and Alice Baird and Najla D. Al Futaisi
and Vincent Brisse and Bjoern W. Schuller
- Abstract summary: We focus on a homogeneous, objective framework for evaluating samples of algorithmically generated music.
We do not seek to assess the musical merit of generated music, but instead explore whether generated samples contain meaningful information pertaining to emotion or mood/theme.
- Score: 13.72212417973239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite advances in deep algorithmic music generation, evaluation of
generated samples often relies on human evaluation, which is subjective and
costly. We focus on designing a homogeneous, objective framework for evaluating
samples of algorithmically generated music. Any engineered measures to evaluate
generated music typically attempt to define the samples' musicality, but do not
capture qualities of music such as theme or mood. We do not seek to assess the
musical merit of generated music, but instead explore whether generated samples
contain meaningful information pertaining to emotion or mood/theme. We achieve
this by measuring the change in predictive performance of a music mood/theme
classifier after augmenting its training data with generated samples. We
analyse music samples generated by three models -- SampleRNN, Jukebox, and DDSP
-- and employ a homogeneous framework across all methods to allow for objective
comparison. This is the first attempt at augmenting a music genre
classification dataset with conditionally generated music. We investigate the
classification performance improvement using deep music generation and the
ability of the generators to make emotional music by using an additional,
emotion annotation of the dataset. Finally, we use a classifier trained on real
data to evaluate the label validity of class-conditionally generated samples.
Related papers
- Generating High-quality Symbolic Music Using Fine-grained Discriminators [42.200747558496055]
We propose to decouple the melody and rhythm from music, and design corresponding fine-grained discriminators to tackle the issues.
Specifically, equipped with a pitch augmentation strategy, the melody discriminator discerns the melody variations presented by the generated samples.
The rhythm discriminator, enhanced with bar-level relative positional encoding, focuses on the velocity of generated notes.
arXiv Detail & Related papers (2024-08-03T07:32:21Z) - Can MusicGen Create Training Data for MIR Tasks? [3.8980564330208662]
We are investigating the broader concept of using AI-based generative music systems to generate training data for Music Information Retrieval tasks.
We constructed over 50 000 genre- conditioned textual descriptions and generated a collection of music excerpts that covers five musical genres.
Preliminary results show that the proposed model can learn genre-specific characteristics from artificial music tracks that generalise well to real-world music recordings.
arXiv Detail & Related papers (2023-11-15T16:41:56Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Comparision Of Adversarial And Non-Adversarial LSTM Music Generative
Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data.
The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - A Comprehensive Survey on Deep Music Generation: Multi-level
Representations, Algorithms, Evaluations, and Future Directions [10.179835761549471]
This paper attempts to provide an overview of various composition tasks under different music generation levels using deep learning.
In addition, we summarize datasets suitable for diverse tasks, discuss the music representations, the evaluation methods as well as the challenges under different levels, and finally point out several future directions.
arXiv Detail & Related papers (2020-11-13T08:01:20Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.