Beat this! Accurate beat tracking without DBN postprocessing
- URL: http://arxiv.org/abs/2407.21658v1
- Date: Wed, 31 Jul 2024 14:59:17 GMT
- Title: Beat this! Accurate beat tracking without DBN postprocessing
- Authors: Francesco Foscarin, Jan Schlüter, Gerhard Widmer,
- Abstract summary: We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy.
We achieve generality by training on multiple datasets, including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations.
For high accuracy, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time.
- Score: 4.440100868992127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.
Related papers
- Detecting Music Performance Errors with Transformers [3.6837762419929168]
Existing tools for music error detection rely on automatic alignment.
There is a lack of sufficient data to train music error detection models.
We present a novel data generation technique capable of creating large-scale synthetic music error datasets.
arXiv Detail & Related papers (2025-01-03T07:04:20Z) - Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems [3.5570874721859016]
Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music.
We identify two primary sources of distribution shift: the music, and the sound.
We evaluate the performance of several SotA AMT systems on two new experimental test sets.
arXiv Detail & Related papers (2024-08-08T19:40:28Z) - MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - SongDriver: Real-time Music Accompaniment Generation without Logical
Latency nor Exposure Bias [15.7153621508319]
SongDriver is a real-time music accompaniment generation system without logical latency or exposure bias.
We train SongDriver on some open-source datasets and an original aiSong dataset built from Chinese-style modern pop music scores.
The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics.
arXiv Detail & Related papers (2022-09-13T15:05:27Z) - Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length.
Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z) - Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances.
We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z) - A Closer Look at Temporal Sentence Grounding in Videos: Datasets and
Metrics [70.45937234489044]
We re- organize two widely-used TSGV datasets (Charades-STA and ActivityNet Captions) to make it different from the training split.
We introduce a new evaluation metric "dR@$n$,IoU@$m$" to calibrate the basic IoU scores.
All the results demonstrate that the re-organized datasets and new metric can better monitor the progress in TSGV.
arXiv Detail & Related papers (2021-01-22T09:59:30Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.