Related papers: Moisesdb: A dataset for source separation beyond 4-stems

Moisesdb: A dataset for source separation beyond 4-stems

URL: http://arxiv.org/abs/2307.15913v1
Date: Sat, 29 Jul 2023 06:59:37 GMT
Title: Moisesdb: A dataset for source separation beyond 4-stems
Authors: Igor Pereira, Felipe Ara\'ujo, Filip Korzeniowski, Richard Vogl
Abstract summary: This paper introduces the MoisesDB dataset for musical source separation. It consists of 240 tracks from 45 artists, covering twelve musical genres. For each song, we provide its individual audio sources, organized in a two-level hierarchical taxonomy of stems.
Score: 0.9176056742068811
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce the MoisesDB dataset for musical source separation. It consists of 240 tracks from 45 artists, covering twelve musical genres. For each song, we provide its individual audio sources, organized in a two-level hierarchical taxonomy of stems. This will facilitate building and evaluating fine-grained source separation systems that go beyond the limitation of using four stems (drums, bass, other, and vocals) due to lack of data. To facilitate the adoption of this dataset, we publish an easy-to-use Python library to download, process and use MoisesDB. Alongside a thorough documentation and analysis of the dataset contents, this work provides baseline results for open-source separation models for varying separation granularities (four, five, and six stems), and discuss their results.

Related papers

Unleashing the Power of Natural Audio Featuring Multiple Sound Sources [54.38251699625379]
Universal sound separation aims to extract clean audio tracks corresponding to distinct events from mixed audio. We propose ClearSep, a framework that employs a data engine to decompose complex naturally mixed audio into multiple independent tracks. In experiments, ClearSep achieves state-of-the-art performance across multiple sound separation tasks.
arXiv Detail & Related papers (2025-04-24T17:58:21Z)
JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata [6.230204066837519]
We introduce JamendoMaxCaps, a large-scale music-caption dataset featuring over 200,000 freely licensed instrumental tracks from the renowned Jamendo platform. The dataset includes captions generated by a state-of-the-art captioning model, enhanced with imputed metadata.
arXiv Detail & Related papers (2025-02-11T11:12:19Z)
Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries [53.30852012059025]
Music source separation is an audio-to-audio retrieval task. Recent work in music source separation has begun to challenge the fixed-stem paradigm. We propose the use of hyperellipsoidal regions as queries to allow for an intuitive yet easily parametrizable approach to specifying both the target (location) and its spread.
arXiv Detail & Related papers (2025-01-27T16:13:50Z)
A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems [53.30852012059025]
Banquet is a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model.
arXiv Detail & Related papers (2024-06-26T20:25:53Z)
Separate Anything You Describe [55.0784713558149]
Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA) AudioSep is a foundation model for open-domain audio source separation with natural language queries.
arXiv Detail & Related papers (2023-08-09T16:09:44Z)
Benchmarks and leaderboards for sound demixing tasks [44.99833362998488]
We introduce two new benchmarks for the sound source separation tasks. We compare popular models for sound demixing, as well as their ensembles, on these benchmarks. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem.
arXiv Detail & Related papers (2023-05-12T14:00:26Z)
MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation [10.456845656569444]
Separation of multiple singing voices into each voice is rarely studied in music source separation research. We introduce MedleyVox, an evaluation dataset for multiple singing voices separation. We present a strategy for construction of multiple singing mixtures using various single-singing datasets.
arXiv Detail & Related papers (2022-11-14T12:27:35Z)
Separate What You Describe: Language-Queried Audio Source Separation [53.65665794338574]
We introduce the task of language-queried audio source separation (LASS) LASS aims to separate a target source from an audio mixture based on a natural language query of the target source. We propose LASS-Net, an end-to-end neural network that is learned to jointly process acoustic and linguistic information.
arXiv Detail & Related papers (2022-03-28T23:47:57Z)
Visual Scene Graphs for Audio Source Separation [65.47212419514761]
State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments. We propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs. Our pipeline is trained end-to-end via a self-supervised task consisting of separating audio sources using the visual graph from artificially mixed sounds.
arXiv Detail & Related papers (2021-09-24T13:40:51Z)
Content based singing voice source separation via strong conditioning using aligned phonemes [7.599399338954308]
In this paper, we present a multimodal multitrack dataset with lyrics aligned in time at the word level with phonetic information. We show that phoneme conditioning can be successfully applied to improve singing voice source separation.
arXiv Detail & Related papers (2020-08-05T12:25:24Z)
MusPy: A Toolkit for Symbolic Music Generation [32.01713268702699]
MusPy is an open source Python library for symbolic music generation. In this paper, we present statistical analysis of the eleven datasets currently supported by MusPy.
arXiv Detail & Related papers (2020-08-05T06:16:13Z)
Multitask learning for instrument activation aware music source separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance. We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset. The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z)
dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains. This will also provide a means for evaluating algorithms specifically designed for music. The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.