SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation
- URL: http://arxiv.org/abs/2409.10995v1
- Date: Tue, 17 Sep 2024 08:58:33 GMT
- Title: SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation
- Authors: Jaime Garcia-Martinez, David Diaz-Guerra, Archontis Politis, Tuomas Virtanen, Julio J. Carabias-Orti, Pedro Vera-Candeas,
- Abstract summary: We introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic training set.
We demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet.
- Score: 7.428668206443388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively explored, largely due to a scarcity of comprehensive and clean (i.e bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic (i.e. using high-quality soundfonts), musically motivated, and heterogeneous training set comprising different dynamics, natural tempo changes, styles, and conditions. Moreover, we demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet, and evaluate its performance under both synthetic and real-world conditions.
Related papers
- Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription [0.6554326244334866]
A popular method used to increase the amount of data is by generating them synthetically from music scores rendered with virtual instruments.
This method can produce a virtually infinite quantity of tracks, but empirical evidence shows that models trained on previously created synthetic datasets do not transfer well to real tracks.
In this work, besides increasing the amount of data, we identify and evaluate three more strategies that practitioners can use to improve the realism of the generated data.
arXiv Detail & Related papers (2024-07-29T09:17:16Z) - Naturalistic Music Decoding from EEG Data via Latent Diffusion Models [14.882764251306094]
This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data.
We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics.
arXiv Detail & Related papers (2024-05-15T03:26:01Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Toward Deep Drum Source Separation [52.01259769265708]
We introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems.
Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date.
We leverage StemGMD to develop LarsNet, a novel deep drum source separation model.
arXiv Detail & Related papers (2023-12-15T10:23:07Z) - JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music
Generation [20.733264277770154]
JEN-1 Composer is a unified framework to efficiently model marginal, conditional, and joint distributions over multi-track music.
We introduce a curriculum training strategy aimed at incrementally instructing the model in the transition from single-track generation to the flexible generation of multi-track combinations.
We demonstrate state-of-the-art performance in controllable and high-fidelity multi-track music synthesis.
arXiv Detail & Related papers (2023-10-29T22:51:49Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Deep Performer: Score-to-Audio Music Performance Synthesis [30.95307878579825]
Deep Performer is a novel system for score-to-audio music performance synthesis.
Unlike speech, music often contains polyphony and long notes.
We show that our proposed model can synthesize music with clear polyphony and harmonic structures.
arXiv Detail & Related papers (2022-02-12T10:36:52Z) - Conditional Drums Generation using Compound Word Representations [4.435094091999926]
We tackle the task of conditional drums generation using a novel data encoding scheme inspired by Compound Word representation.
We present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) receives information about the conditioning parameters.
A Transformer-based Decoder with relative global attention produces the generated drum sequences.
arXiv Detail & Related papers (2022-02-09T13:49:27Z) - Multitask learning for instrument activation aware music source
separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance.
We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset.
The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.