Related papers: SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling

SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling

URL: http://arxiv.org/abs/2506.14293v3
Date: Wed, 25 Jun 2025 08:18:37 GMT
Title: SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling
Authors: Tawsif Ahmed, Andrej Radonjic, Gollam Rabby,
Abstract summary: To the best of our knowledge, there are no open-source high-quality datasets representing popular and well-known songs for generative music modeling tasks.<n>Our dataset changes this narrative and provides a dataset that is constructed using actual popular music and world-renowned artists.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Sleeping-DISCO 9M, a large-scale pre-training dataset for music and song. To the best of our knowledge, there are no open-source high-quality dataset representing popular and well-known songs for generative music modeling tasks such as text-music, music-captioning, singing-voice synthesis, melody reconstruction and cross-model retrieval. Past contributions focused on isolated and constrained factors whose core perspective was to create synthetic or re-recorded music corpus (e.g. GTSinger, M4Singer) and arbitrarily large-scale audio datasets (e.g. DISCO-10M and LAIONDISCO-12M) had been another focus for the community. Unfortunately, adoption of these datasets has been below substantial in the generative music community as these datasets fail to reflect real-world music and its flavour. Our dataset changes this narrative and provides a dataset that is constructed using actual popular music and world-renowned artists.

Related papers

JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata [6.230204066837519]
JamendoMaxCaps is a large-scale music-caption dataset featuring over 362,000 freely licensed instrumental tracks from the Jamendo platform.<n>The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata.
arXiv Detail & Related papers (2025-02-11T11:12:19Z)
Sanidha: A Studio Quality Multi-Modal Dataset for Carnatic Music [0.8437187555622164]
Music source separation demixes a piece of music into its individual sound sources.<n>Most commonly available datasets are made from commercial Western music.<n>'Sanidha' is the first open-source novel dataset for Carnatic music.
arXiv Detail & Related papers (2025-01-12T22:39:58Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response [42.73982391253872]
MusiLingo is a novel system for music caption generation and music-related query responses. We train it on an extensive music caption dataset and fine-tune it with instructional data. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs.
arXiv Detail & Related papers (2023-09-15T19:31:40Z)
DISCO-10M: A Large-Scale Music Dataset [20.706469085872516]
We present DISCO-10M, a novel and extensive music dataset. It surpasses the largest previously available music dataset by an order of magnitude. We aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.
arXiv Detail & Related papers (2023-06-23T14:27:14Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)
A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data. The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z)
Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation. ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z)
dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.