SSM-Net: feature learning for Music Structure Analysis using a
Self-Similarity-Matrix based loss
- URL: http://arxiv.org/abs/2211.08141v1
- Date: Tue, 15 Nov 2022 13:48:11 GMT
- Title: SSM-Net: feature learning for Music Structure Analysis using a
Self-Similarity-Matrix based loss
- Authors: Geoffroy Peeters and Florian Angulo
- Abstract summary: We train a deep encoder to learn features such that the Self-Similarity-Matrix (SSM) resulting from those approximates a ground-truth SSM.
We successfully demonstrate the use of this training paradigm using the Area Under the Curve ROC (AUC) on the RWC-Pop dataset.
- Score: 7.599399338954308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a new paradigm to learn audio features for Music
Structure Analysis (MSA). We train a deep encoder to learn features such that
the Self-Similarity-Matrix (SSM) resulting from those approximates a
ground-truth SSM. This is done by minimizing a loss between both SSMs. Since
this loss is differentiable w.r.t. its input features we can train the encoder
in a straightforward way. We successfully demonstrate the use of this training
paradigm using the Area Under the Curve ROC (AUC) on the RWC-Pop dataset.
Related papers
- Self-Similarity-Based and Novelty-based loss for music structure
analysis [5.3900692419866285]
We propose a supervised approach for the task of music boundary detection.
In our approach we simultaneously learn features and convolution kernels.
We demonstrate that relative feature learning, through self-attention, is beneficial for the task of MSA.
arXiv Detail & Related papers (2023-09-05T13:49:29Z) - Systematic Investigation of Sparse Perturbed Sharpness-Aware
Minimization Optimizer [158.2634766682187]
Deep neural networks often suffer from poor generalization due to complex and non- unstructured loss landscapes.
SharpnessAware Minimization (SAM) is a popular solution that smooths the loss by minimizing the change of landscape when adding a perturbation.
In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves perturbation by a binary mask.
arXiv Detail & Related papers (2023-06-30T09:33:41Z) - Noise-Robust Loss Functions: Enhancing Bounded Losses for Large-Scale Noisy Data Learning [0.0]
Large annotated datasets inevitably contain noisy labels, which poses a major challenge for training deep neural networks as they easily memorize the labels.
Noise-robust loss functions have emerged as a notable strategy to counteract this issue, but it remains challenging to create a robust loss function which is not susceptible to underfitting.
We propose a novel method denoted as logit bias, which adds a real number $epsilon$ to the logit at the position of the correct class.
arXiv Detail & Related papers (2023-06-08T18:38:55Z) - MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation
Models [10.10825306582544]
We propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning.
For evaluation, we apply our framework to various video foundation models (UniVL, Violet and All-in-one) and show significant performance gain on all four downstream tasks.
arXiv Detail & Related papers (2023-03-23T03:06:44Z) - Extended Unconstrained Features Model for Exploring Deep Neural Collapse [59.59039125375527]
Recently, a phenomenon termed "neural collapse" (NC) has been empirically observed in deep neural networks.
Recent papers have shown that minimizers with this structure emerge when optimizing a simplified "unconstrained features model"
In this paper, we study the UFM for the regularized MSE loss, and show that the minimizers' features can be more structured than in the cross-entropy case.
arXiv Detail & Related papers (2022-02-16T14:17:37Z) - Automated Audio Captioning using Transfer Learning and Reconstruction
Latent Space Similarity Regularization [21.216783537997426]
We propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task.
We also introduce a novel self-supervised objective, Reconstruction Latent Space Similarity Regularization (RLSSR)
arXiv Detail & Related papers (2021-08-10T13:49:41Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - On Parameter Tuning in Meta-learning for Computer Vision [2.3513645401551333]
In this paper, we investigate mage recognition for unseen categories of a given dataset with limited training information.
We deploy a zero-shot learning (ZSL) algorithm to achieve this goal.
We also explore the effect of parameter tuning on performance of semantic auto-encoder (SAE)
arXiv Detail & Related papers (2020-02-11T15:07:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.