Related papers: Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation

URL: http://arxiv.org/abs/2104.08580v1
Date: Sat, 17 Apr 2021 15:48:24 GMT
Title: Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation
Authors: Axel Marmoret (1), J\'er\'emy E. Cohen (1), Nancy Bertin (1), Fr\'ed\'eric Bimbot (1) ((1) Univ Rennes, Inria, CNRS, IRISA, France.)
Abstract summary: The present work investigates the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few patterns, we illustrate the ability of the decomposition to capture and single out repeated motifs in the corresponding compressed space.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has proposed the use of tensor decomposition to model repetitions and to separate tracks in loop-based electronic music. The present work investigates further on the ability of Nonnegative Tucker Decompositon (NTD) to uncover musical patterns and structure in pop songs in their audio form. Exploiting the fact that NTD tends to express the content of bars as linear combinations of a few patterns, we illustrate the ability of the decomposition to capture and single out repeated motifs in the corresponding compressed space, which can be interpreted from a musical viewpoint. The resulting features also turn out to be efficient for structural segmentation, leading to experimental results on the RWC Pop data set which are potentially challenging state-of-the-art approaches that rely on extensive example-based learning schemes.

Related papers

Progressive Rock Music Classification [0.0]
This study investigates the classification of progressive rock music, a genre characterized by complex compositions and diverse instrumentation. We extracted comprehensive audio features, including spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chromagrams, and beat positions from song snippets. A winner-take-all voting strategy was employed to aggregate snippet-level predictions into final song classifications.
arXiv Detail & Related papers (2025-04-15T02:48:52Z)
LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation [49.89372182441713]
We introduce LARP, a multi-modal cold-start playlist continuation model. Our framework uses increasing stages of task-specific abstraction: within-track (language-audio) contrastive loss, track-track contrastive loss, and track-playlist contrastive loss.
arXiv Detail & Related papers (2024-06-20T14:02:15Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Barwise Compression Schemes for Audio-Based Music Structure Analysis [4.39160562548524]
Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods.
arXiv Detail & Related papers (2022-02-10T12:23:57Z)
Exploring single-song autoencoding schemes for audio-based music structure analysis [6.037383467521294]
This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song. We report that the proposed unsupervised auto-encoding scheme achieves the level of performance of supervised state-of-the-art methods with 3 seconds tolerance.
arXiv Detail & Related papers (2021-10-27T13:48:25Z)
ModeRNN: Harnessing Spatiotemporal Mode Collapse in Unsupervised Predictive Learning [75.2748374360642]
We propose ModeRNN, which introduces a novel method to learn hidden structured representations between recurrent states. Across the entire dataset, different modes result in different responses on the mixtures of slots, which enhances the ability of ModeRNN to build structured representations.
arXiv Detail & Related papers (2021-10-08T03:47:54Z)
Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment. We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z)
Lets Play Music: Audio-driven Performance Video Generation [58.77609661515749]
We propose a new task named Audio-driven Per-formance Video Generation (APVG) APVG aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip.
arXiv Detail & Related papers (2020-11-05T03:13:46Z)
On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
Music Generation with Temporal Structure Augmentation [0.0]
The proposed method augments a connectionist generation model with count-down to song conclusion and meter markers as extra input features. An RNN architecture with LSTM cells is trained on the Nottingham folk music dataset in a supervised sequence learning setup. Experiments show an improved prediction performance for both types of annotation.
arXiv Detail & Related papers (2020-04-21T19:19:58Z)
Modeling Musical Structure with Artificial Neural Networks [0.0]
I explore the application of artificial neural networks to different aspects of musical structure modeling. I show how a connectionist model, the Gated Autoencoder (GAE), can be employed to learn transformations between musical fragments. I propose a special predictive training of the GAE, which yields a representation of polyphonic music as a sequence of intervals.
arXiv Detail & Related papers (2020-01-06T18:35:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.