Related papers: Automatic Music Sample Identification with Multi-Track Contrastive Learning

Automatic Music Sample Identification with Multi-Track Contrastive Learning

URL: http://arxiv.org/abs/2510.11507v2
Date: Mon, 27 Oct 2025 10:57:33 GMT
Title: Automatic Music Sample Identification with Multi-Track Contrastive Learning
Authors: Alain Riou, Joan Serrà, Yuki Mitsufuji,
Abstract summary: We tackle the challenging task of automatic sample identification.<n>We adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes.<n>We show that such method significantly outperforms previous state-of-the-art baselines.
Score: 36.60619556916679
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sampling, the technique of reusing pieces of existing audio tracks to create new music content, is a very common practice in modern music production. In this paper, we tackle the challenging task of automatic sample identification, that is, detecting such sampled content and retrieving the material from which it originates. To do so, we adopt a self-supervised learning approach that leverages a multi-track dataset to create positive pairs of artificial mixes, and design a novel contrastive learning objective. We show that such method significantly outperforms previous state-of-the-art baselines, that is robust to various genres, and that scales well when increasing the number of noise songs in the reference database. In addition, we extensively analyze the contribution of the different components of our training pipeline and highlight, in particular, the need for high-quality separated stems for this task.

Related papers

A Study on the Data Distribution Gap in Music Emotion Recognition [7.281487567929003]
Music Emotion Recognition (MER) is a task deeply connected to human perception.<n>Prior studies tend to focus on specific musical styles rather than incorporating a diverse range of genres.<n>We address the task of recognizing emotion from audio content by investigating five datasets with dimensional emotion annotations.
arXiv Detail & Related papers (2025-10-06T10:57:05Z)
Unleashing the Power of Natural Audio Featuring Multiple Sound Sources [54.38251699625379]
Universal sound separation aims to extract clean audio tracks corresponding to distinct events from mixed audio.<n>We propose ClearSep, a framework that employs a data engine to decompose complex naturally mixed audio into multiple independent tracks.<n>In experiments, ClearSep achieves state-of-the-art performance across multiple sound separation tasks.
arXiv Detail & Related papers (2025-04-24T17:58:21Z)
Automatic Identification of Samples in Hip-Hop Music via Multi-Loss Training and an Artificial Dataset [0.29998889086656577]
We show that a convolutional neural network trained on an artificial dataset can identify real-world samples in commercial hip-hop music.<n>We optimize the model using a joint classification and metric learning loss and show that it achieves 13% greater precision on real-world instances of sampling.
arXiv Detail & Related papers (2025-02-10T11:30:35Z)
Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content. We employ the snippet embeddings in the higher-level task of cross-modal piece identification. In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z)
Investigating Personalization Methods in Text to Music Generation [21.71190700761388]
Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. For evaluation, we construct a novel dataset with prompts and music clips. Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody.
arXiv Detail & Related papers (2023-09-20T08:36:34Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Automatic music mixing with deep learning and out-of-domain data [10.670987762781834]
Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge. We propose a novel data preprocessing method that allows the models to perform automatic music mixing. We also redesigned a listening test method for evaluating music mixing systems.
arXiv Detail & Related papers (2022-08-24T10:50:22Z)
Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z)
SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance [88.0355290619761]
This work focuses on the separation of unknown musical instruments. We propose the Separation-with-Consistency (SeCo) framework, which can accomplish the separation on unknown categories. Our framework exhibits strong adaptation ability on the novel musical categories and outperforms the baseline methods by a significant margin.
arXiv Detail & Related papers (2022-03-25T09:42:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.