Moisesdb: A dataset for source separation beyond 4-stems
- URL: http://arxiv.org/abs/2307.15913v1
- Date: Sat, 29 Jul 2023 06:59:37 GMT
- Title: Moisesdb: A dataset for source separation beyond 4-stems
- Authors: Igor Pereira, Felipe Ara\'ujo, Filip Korzeniowski, Richard Vogl
- Abstract summary: This paper introduces the MoisesDB dataset for musical source separation.
It consists of 240 tracks from 45 artists, covering twelve musical genres.
For each song, we provide its individual audio sources, organized in a two-level hierarchical taxonomy of stems.
- Score: 0.9176056742068811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce the MoisesDB dataset for musical source
separation. It consists of 240 tracks from 45 artists, covering twelve musical
genres. For each song, we provide its individual audio sources, organized in a
two-level hierarchical taxonomy of stems. This will facilitate building and
evaluating fine-grained source separation systems that go beyond the limitation
of using four stems (drums, bass, other, and vocals) due to lack of data. To
facilitate the adoption of this dataset, we publish an easy-to-use Python
library to download, process and use MoisesDB. Alongside a thorough
documentation and analysis of the dataset contents, this work provides baseline
results for open-source separation models for varying separation granularities
(four, five, and six stems), and discuss their results.
Related papers
- JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata [6.230204066837519]
We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 200,000 freely licensed instrumental tracks from the renowned Jamendo platform.
The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata.
arXiv Detail & Related papers (2025-02-11T11:12:19Z) - Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries [53.30852012059025]
Music source separation is an audio-to-audio retrieval task.
Recent work in music source separation has begun to challenge the fixed-stem paradigm.
We propose the use of hyperellipsoidal regions as queries to allow for an intuitive yet easily parametrizable approach to specifying both the target (location) and its spread.
arXiv Detail & Related papers (2025-01-27T16:13:50Z) - A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems [53.30852012059025]
Banquet is a system that allows source separation of multiple stems using just one decoder.
A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model.
arXiv Detail & Related papers (2024-06-26T20:25:53Z) - Separate Anything You Describe [53.30484933564858]
Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA)
AudioSep is a foundation model for open-domain audio source separation with natural language queries.
arXiv Detail & Related papers (2023-08-09T16:09:44Z) - MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation [10.456845656569444]
Separation of multiple singing voices into each voice is rarely studied in music source separation research.
We introduce MedleyVox, an evaluation dataset for multiple singing voices separation.
We present a strategy for construction of multiple singing mixtures using various single-singing datasets.
arXiv Detail & Related papers (2022-11-14T12:27:35Z) - Visual Scene Graphs for Audio Source Separation [65.47212419514761]
State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments.
We propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs.
Our pipeline is trained end-to-end via a self-supervised task consisting of separating audio sources using the visual graph from artificially mixed sounds.
arXiv Detail & Related papers (2021-09-24T13:40:51Z) - MusPy: A Toolkit for Symbolic Music Generation [32.01713268702699]
MusPy is an open source Python library for symbolic music generation.
In this paper, we present statistical analysis of the eleven datasets currently supported by MusPy.
arXiv Detail & Related papers (2020-08-05T06:16:13Z) - Multitask learning for instrument activation aware music source
separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance.
We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset.
The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.