ComMU: Dataset for Combinatorial Music Generation
- URL: http://arxiv.org/abs/2211.09385v1
- Date: Thu, 17 Nov 2022 07:25:09 GMT
- Title: ComMU: Dataset for Combinatorial Music Generation
- Authors: Lee Hyun, Taehyun Kim, Hyolim Kang, Minjoo Ki, Hyeonchan Hwang, Kwanho
Park, Sharang Han, Seon Joo Kim
- Abstract summary: Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music.
ComMU is the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata.
Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition.
- Score: 20.762884001498627
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Commercial adoption of automatic music composition requires the capability of
generating diverse and high-quality music suitable for the desired context
(e.g., music for romantic movies, action games, restaurants, etc.). In this
paper, we introduce combinatorial music generation, a new task to create
varying background music based on given conditions. Combinatorial music
generation creates short samples of music with rich musical metadata, and
combines them to produce a complete music. In addition, we introduce ComMU, the
first symbolic music dataset consisting of short music samples and their
corresponding 12 musical metadata for combinatorial music generation. Notable
properties of ComMU are that (1) dataset is manually constructed by
professional composers with an objective guideline that induces regularity, and
(2) it has 12 musical metadata that embraces composers' intentions. Our results
show that we can generate diverse high-quality music only with metadata, and
that our unique metadata such as track-role and extended chord quality improves
the capacity of the automatic composition. We highly recommend watching our
video before reading the paper (https://pozalabs.github.io/ComMU).
Related papers
- Melody Is All You Need For Music Generation [10.366088659024685]
We present the Melody Guided Music Generation (MMGen) model, the first novel approach using melody to guide the music generation.
Specifically, we first align the melody with audio waveforms and their associated descriptions using the multimodal alignment module.
This allows MMGen to generate music that matches the style of the provided audio while also producing music that reflects the content of the given text description.
arXiv Detail & Related papers (2024-09-30T11:13:35Z) - ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts.
We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Video2Music: Suitable Music Generation from Videos using an Affective
Multimodal Transformer model [32.801213106782335]
We develop a generative music AI framework, Video2Music, that can match a provided video.
In a thorough experiment, we show that our proposed framework can generate music that matches the video content in terms of emotion.
arXiv Detail & Related papers (2023-11-02T03:33:00Z) - MusicLDM: Enhancing Novelty in Text-to-Music Generation Using
Beat-Synchronous Mixup Strategies [32.482588500419006]
We build a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain.
We propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup.
In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music.
arXiv Detail & Related papers (2023-08-03T05:35:37Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Video Background Music Generation: Dataset, Method and Evaluation [31.15901120245794]
We introduce a complete recipe including dataset, benchmark model, and evaluation metric for video background music generation.
We present SymMV, a video and symbolic music dataset with various musical annotations.
We also propose a benchmark video background music generation framework named V-MusProd.
arXiv Detail & Related papers (2022-11-21T08:39:48Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.