Self-Supervised Beat Tracking in Musical Signals with Polyphonic
Contrastive Learning
- URL: http://arxiv.org/abs/2201.01771v2
- Date: Sun, 16 Jul 2023 01:12:36 GMT
- Title: Self-Supervised Beat Tracking in Musical Signals with Polyphonic
Contrastive Learning
- Authors: Dorian Desblancs
- Abstract summary: We present a new self-supervised learning pretext task for beat tracking and downbeat estimation.
It makes use of Spleeter, an audio source separation model, to separate a song's drums from the rest of its signal.
It is notably one of the first works to use audio source separation as a fundamental component of self-supervision.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Annotating musical beats is a very long and tedious process. In order to
combat this problem, we present a new self-supervised learning pretext task for
beat tracking and downbeat estimation. This task makes use of Spleeter, an
audio source separation model, to separate a song's drums from the rest of its
signal. The first set of signals are used as positives, and by extension
negatives, for contrastive learning pre-training. The drum-less signals, on the
other hand, are used as anchors. When pre-training a fully-convolutional and
recurrent model using this pretext task, an onset function is learned. In some
cases, this function is found to be mapped to periodic elements in a song. We
find that pre-trained models outperform randomly initialized models when a beat
tracking training set is extremely small (less than 10 examples). When this is
not the case, pre-training leads to a learning speed-up that causes the model
to overfit to the training set. More generally, this work defines new
perspectives in the realm of musical self-supervised learning. It is notably
one of the first works to use audio source separation as a fundamental
component of self-supervision.
Related papers
- MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging [6.363158395541767]
Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data.
In this study, we investigate and compare the performance of new self-supervised methods for music tagging.
arXiv Detail & Related papers (2024-04-14T07:56:08Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - Comparision Of Adversarial And Non-Adversarial LSTM Music Generative
Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data.
The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z) - Spectrograms Are Sequences of Patches [5.253100011321437]
We design a self-supervised model that captures a spectrogram of music as a series of patches: Patchifier.
We do not use labeled data for the pre-training process, only a subset of the MTAT dataset containing 16k music clips.
Our model achieves a considerably acceptable result compared to other audio representation models.
arXiv Detail & Related papers (2022-10-28T08:39:36Z) - Large-Scale Pre-training for Person Re-identification with Noisy Labels [125.49696935852634]
We develop a large-scale Pre-training framework utilizing Noisy Labels (PNL)
In principle, joint learning of these three modules not only clusters similar examples to one prototype, but also rectifies noisy labels based on the prototype assignment.
This simple pre-training task provides a scalable way to learn SOTA Re-ID representations from scratch on "LUPerson-NL" without bells and whistles.
arXiv Detail & Related papers (2022-03-30T17:59:58Z) - Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced
Training for Neural Machine Translation [15.309573393914462]
Neural networks tend to forget the previously learned knowledge when learning multiple tasks sequentially from dynamic data distributions.
This problem is called textitcatastrophic forgetting, which is a fundamental challenge in the continual learning of neural networks.
We propose Complementary Online Knowledge Distillation (COKD), which uses dynamically updated teacher models trained on specific data orders to iteratively provide complementary knowledge to the student model.
arXiv Detail & Related papers (2022-03-08T08:08:45Z) - Catch-A-Waveform: Learning to Generate Audio from a Single Short Example [33.96833901121411]
We present a GAN-based generative model that can be trained on one short audio signal from any domain.
We show that in all cases, no more than 20 seconds of training audio commonly suffice for our model to achieve state-of-the-art results.
arXiv Detail & Related papers (2021-06-11T14:35:11Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Dance Revolution: Long-Term Dance Generation with Music via Curriculum
Learning [55.854205371307884]
We formalize the music-conditioned dance generation as a sequence-to-sequence learning problem.
We propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation.
Our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-06-11T00:08:25Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.