Toward Deep Drum Source Separation
- URL: http://arxiv.org/abs/2312.09663v3
- Date: Mon, 20 May 2024 07:30:25 GMT
- Title: Toward Deep Drum Source Separation
- Authors: Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti,
- Abstract summary: We introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems.
Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date.
We leverage StemGMD to develop LarsNet, a novel deep drum source separation model.
- Score: 52.01259769265708
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.
Related papers
- SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation [7.428668206443388]
We introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic training set.
We demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet.
arXiv Detail & Related papers (2024-09-17T08:58:33Z) - Subtractive Training for Music Stem Insertion using Latent Diffusion Models [35.91945598575059]
We present Subtractive Training, a method for synthesizing individual musical instrument stems given other instruments as context.
Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks.
We extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements.
arXiv Detail & Related papers (2024-06-27T16:59:14Z) - A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems [53.30852012059025]
Banquet is a system that allows source separation of multiple stems using just one decoder.
A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model.
arXiv Detail & Related papers (2024-06-26T20:25:53Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio [11.941510958668557]
We call this task as Musical Instrument Retrieval.
We propose a method for retrieving desired musical instruments using reference music mixture as a query.
The proposed model consists of the Single-Instrument and the Multi-Instrument, both based on convolutional neural networks.
arXiv Detail & Related papers (2022-11-15T07:32:39Z) - DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With
Autoencoding Generative Adversarial Networks [0.0]
We present DrumGAN VST, a plugin for synthesizing drum sounds using a Generative Adrial Network.
DrumGAN VST operates on 44.1 kHz sample-rate audio, offers independent and continuous instrument class controls, and features an encoding neural network that maps sounds into the GAN's latent space.
arXiv Detail & Related papers (2022-06-29T15:44:19Z) - Source Separation-based Data Augmentation for Improved Joint Beat and
Downbeat Tracking [33.05612957858605]
We propose to use a blind drum separation model to segregate the drum and non-drum sounds from each training audio signal.
We report experiments on four completely unseen test sets, validating the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-06-16T11:09:05Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Multitask learning for instrument activation aware music source
separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance.
We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset.
The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z) - Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
Source Separation [96.18178553315472]
We propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio.
We integrate both stereo generation and source separation into a unified framework, Sep-Stereo.
arXiv Detail & Related papers (2020-07-20T06:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.