Moisesdb: A dataset for source separation beyond 4-stems
- URL: http://arxiv.org/abs/2307.15913v1
- Date: Sat, 29 Jul 2023 06:59:37 GMT
- Title: Moisesdb: A dataset for source separation beyond 4-stems
- Authors: Igor Pereira, Felipe Ara\'ujo, Filip Korzeniowski, Richard Vogl
- Abstract summary: This paper introduces the MoisesDB dataset for musical source separation.
It consists of 240 tracks from 45 artists, covering twelve musical genres.
For each song, we provide its individual audio sources, organized in a two-level hierarchical taxonomy of stems.
- Score: 0.9176056742068811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce the MoisesDB dataset for musical source
separation. It consists of 240 tracks from 45 artists, covering twelve musical
genres. For each song, we provide its individual audio sources, organized in a
two-level hierarchical taxonomy of stems. This will facilitate building and
evaluating fine-grained source separation systems that go beyond the limitation
of using four stems (drums, bass, other, and vocals) due to lack of data. To
facilitate the adoption of this dataset, we publish an easy-to-use Python
library to download, process and use MoisesDB. Alongside a thorough
documentation and analysis of the dataset contents, this work provides baseline
results for open-source separation models for varying separation granularities
(four, five, and six stems), and discuss their results.
Related papers
- A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems [53.30852012059025]
Banquet is a system that allows source separation of multiple stems using just one decoder.
A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model.
arXiv Detail & Related papers (2024-06-26T20:25:53Z) - Separate Anything You Describe [55.0784713558149]
Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA)
AudioSep is a foundation model for open-domain audio source separation with natural language queries.
arXiv Detail & Related papers (2023-08-09T16:09:44Z) - Benchmarks and leaderboards for sound demixing tasks [44.99833362998488]
We introduce two new benchmarks for the sound source separation tasks.
We compare popular models for sound demixing, as well as their ensembles, on these benchmarks.
We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem.
arXiv Detail & Related papers (2023-05-12T14:00:26Z) - MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation [10.456845656569444]
Separation of multiple singing voices into each voice is rarely studied in music source separation research.
We introduce MedleyVox, an evaluation dataset for multiple singing voices separation.
We present a strategy for construction of multiple singing mixtures using various single-singing datasets.
arXiv Detail & Related papers (2022-11-14T12:27:35Z) - Separate What You Describe: Language-Queried Audio Source Separation [53.65665794338574]
We introduce the task of language-queried audio source separation (LASS)
LASS aims to separate a target source from an audio mixture based on a natural language query of the target source.
We propose LASS-Net, an end-to-end neural network that is learned to jointly process acoustic and linguistic information.
arXiv Detail & Related papers (2022-03-28T23:47:57Z) - Visual Scene Graphs for Audio Source Separation [65.47212419514761]
State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments.
We propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs.
Our pipeline is trained end-to-end via a self-supervised task consisting of separating audio sources using the visual graph from artificially mixed sounds.
arXiv Detail & Related papers (2021-09-24T13:40:51Z) - Content based singing voice source separation via strong conditioning
using aligned phonemes [7.599399338954308]
In this paper, we present a multimodal multitrack dataset with lyrics aligned in time at the word level with phonetic information.
We show that phoneme conditioning can be successfully applied to improve singing voice source separation.
arXiv Detail & Related papers (2020-08-05T12:25:24Z) - MusPy: A Toolkit for Symbolic Music Generation [32.01713268702699]
MusPy is an open source Python library for symbolic music generation.
In this paper, we present statistical analysis of the eleven datasets currently supported by MusPy.
arXiv Detail & Related papers (2020-08-05T06:16:13Z) - Multitask learning for instrument activation aware music source
separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance.
We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset.
The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.