The Chamber Ensemble Generator: Limitless High-Quality MIR Data via
Generative Modeling
- URL: http://arxiv.org/abs/2209.14458v1
- Date: Wed, 28 Sep 2022 22:55:15 GMT
- Title: The Chamber Ensemble Generator: Limitless High-Quality MIR Data via
Generative Modeling
- Authors: Yusong Wu, Josh Gardner, Ethan Manilow, Ian Simon, Curtis Hawthorne,
Jesse Engel
- Abstract summary: We show a system capable of producing unlimited amounts of realistic chorale music with rich annotations.
We generate a large dataset of chorales from four different chamber ensembles.
We release both the system and the dataset as an open-source foundation for future work in the MIR community.
- Score: 6.009299746966725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data is the lifeblood of modern machine learning systems, including for those
in Music Information Retrieval (MIR). However, MIR has long been mired by small
datasets and unreliable labels. In this work, we propose to break this
bottleneck using generative modeling. By pipelining a generative model of notes
(Coconet trained on Bach Chorales) with a structured synthesis model of chamber
ensembles (MIDI-DDSP trained on URMP), we demonstrate a system capable of
producing unlimited amounts of realistic chorale music with rich annotations
including mixes, stems, MIDI, note-level performance attributes (staccato,
vibrato, etc.), and even fine-grained synthesis parameters (pitch, amplitude,
etc.). We call this system the Chamber Ensemble Generator (CEG), and use it to
generate a large dataset of chorales from four different chamber ensembles
(CocoChorales). We demonstrate that data generated using our approach improves
state-of-the-art models for music transcription and source separation, and we
release both the system and the dataset as an open-source foundation for future
work in the MIR community.
Related papers
- SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation [7.428668206443388]
We introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic training set.
We demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet.
arXiv Detail & Related papers (2024-09-17T08:58:33Z) - Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation [3.8570045844185237]
We present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset.
Our model comprises two networks: an encoder and a predictor, which are jointly trained to predict the embeddings of compatible stems.
We evaluate our model's performance on a retrieval task on the MUSDB18 dataset, testing its ability to find the missing stem from a mix.
arXiv Detail & Related papers (2024-08-05T14:34:40Z) - Naturalistic Music Decoding from EEG Data via Latent Diffusion Models [14.882764251306094]
This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data.
We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics.
arXiv Detail & Related papers (2024-05-15T03:26:01Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) [57.30228361181045]
This survey connects key advancements in recommender systems using Generative Models (Gen-RecSys)
It covers: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS.
Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges.
arXiv Detail & Related papers (2024-03-31T06:57:57Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - Sound Model Factory: An Integrated System Architecture for Generative
Audio Modelling [4.193940401637568]
We introduce a new system for data-driven audio sound model design built around two different neural network architectures.
The objective of the system is to generate interactively controllable sound models given (a) a range of sounds the model should be able to synthesize, and (b) a specification of the parametric controls for navigating that space of sounds.
arXiv Detail & Related papers (2022-06-27T07:10:22Z) - Conditional Drums Generation using Compound Word Representations [4.435094091999926]
We tackle the task of conditional drums generation using a novel data encoding scheme inspired by Compound Word representation.
We present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) receives information about the conditioning parameters.
A Transformer-based Decoder with relative global attention produces the generated drum sequences.
arXiv Detail & Related papers (2022-02-09T13:49:27Z) - Multitask learning for instrument activation aware music source
separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance.
We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset.
The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.