Related papers: The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

URL: http://arxiv.org/abs/2505.03337v1
Date: Tue, 06 May 2025 09:08:50 GMT
Title: The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis
Authors: Bernardo Torres, Geoffroy Peeters, Gael Richard,
Abstract summary: Inverse Drum Machine (IDM) is a novel approach to drum source separation that combines analysis-by-synthesis with deep learning.<n>IDM reconstructs individual drum stems and trains a neural network to match the original mixture.<n> Evaluations on the StemGMD dataset show IDM achieves separation performance on par with state-of-the-art supervised methods.
Score: 4.0595858175849076
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce the Inverse Drum Machine (IDM), a novel approach to drum source separation that combines analysis-by-synthesis with deep learning. Unlike recent supervised methods that rely on isolated stems, IDM requires only transcription annotations. It jointly optimizes automatic drum transcription and one-shot drum sample synthesis in an end-to-end framework. By convolving synthesized one-shot samples with estimated onsets-mimicking a drum machine-IDM reconstructs individual drum stems and trains a neural network to match the original mixture. Evaluations on the StemGMD dataset show that IDM achieves separation performance on par with state-of-the-art supervised methods, while substantially outperforming matrix decomposition baselines.

Related papers

Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers [49.1574468325115]
We introduce a unified convergence analysis framework for deterministic samplers. Our framework achieves iteration complexity of $tilde O(d2/epsilon)$. We also provide a detailed analysis of Denoising Implicit Diffusion Models (DDIM)-type samplers.
arXiv Detail & Related papers (2024-10-18T07:37:36Z)
Toward Deep Drum Source Separation [52.01259769265708]
We introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model.
arXiv Detail & Related papers (2023-12-15T10:23:07Z)
PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization [139.242907155883]
This work proposes PROM, a new PhRase-level cOpying Mechanism that enhances attention on n-grams. PROM adds an indicator layer to explicitly pick up tokens in n-gram that can be copied from the source, and calculates an auxiliary loss for the copying prediction. In zero-shot setting, PROM is utilized in the self-supervised pre-training on raw corpora and provides new general baselines on a wide range of summarization datasets.
arXiv Detail & Related papers (2023-05-11T08:29:05Z)
Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence. In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z)
Conditional Drums Generation using Compound Word Representations [4.435094091999926]
We tackle the task of conditional drums generation using a novel data encoding scheme inspired by Compound Word representation. We present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) receives information about the conditioning parameters. A Transformer-based Decoder with relative global attention produces the generated drum sequences.
arXiv Detail & Related papers (2022-02-09T13:49:27Z)
Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds [29.012177604120048]
A differentiable digital signal processing (DDSP) autoencoder is a musical sound that combines a deep neural network (DNN) and spectral modeling synthesis. It allows us to flexibly edit sounds by changing the fundamental frequency, timbre feature, and loudness (synthesis parameters) extracted from an input sound. It is designed for a monophonic harmonic sound and cannot handle mixtures of sounds harmonic.
arXiv Detail & Related papers (2022-02-01T03:38:49Z)
Reference-based Magnetic Resonance Image Reconstruction Using Texture Transforme [86.6394254676369]
We propose a novel Texture Transformer Module (TTM) for accelerated MRI reconstruction. We formulate the under-sampled data and reference data as queries and keys in a transformer. The proposed TTM can be stacked on prior MRI reconstruction approaches to further improve their performance.
arXiv Detail & Related papers (2021-11-18T03:06:25Z)
Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms [18.5148472561169]
This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal. To capture the global repetitive structure of drum scores, we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. Experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure.
arXiv Detail & Related papers (2021-05-12T17:04:16Z)
Multitask learning for instrument activation aware music source separation [83.30944624666839]
We propose a novel multitask structure to investigate using instrument activation information to improve source separation performance. We investigate our system on six independent instruments, a more realistic scenario than the three instruments included in the widely-used MUSDB dataset. The results show that our proposed multitask model outperforms the baseline Open-Unmix model on the mixture of Mixing Secrets and MedleyDB dataset.
arXiv Detail & Related papers (2020-08-03T02:35:00Z)
wav2shape: Hearing the Shape of a Drum Machine [4.283530753133897]
Disentangling and recovering physical attributes from a waveform is a challenging inverse problem in audio signal processing. We propose to address this problem via a combination of time--frequency analysis and supervised machine learning.
arXiv Detail & Related papers (2020-07-20T17:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.