Self-supervised Auxiliary Loss for Metric Learning in Music
Similarity-based Retrieval and Auto-tagging
- URL: http://arxiv.org/abs/2304.07449v1
- Date: Sat, 15 Apr 2023 02:00:28 GMT
- Title: Self-supervised Auxiliary Loss for Metric Learning in Music
Similarity-based Retrieval and Auto-tagging
- Authors: Taketo Akama, Hiroaki Kitano, Katsuhiro Takematsu, Yasushi Miyajima,
and Natalia Polouliakh
- Abstract summary: We propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge.
We also found that refraining from employing augmentation during the fine-tuning phase yields better results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realm of music information retrieval, similarity-based retrieval and
auto-tagging serve as essential components. Given the limitations and
non-scalability of human supervision signals, it becomes crucial for models to
learn from alternative sources to enhance their performance. Self-supervised
learning, which exclusively relies on learning signals derived from music audio
data, has demonstrated its efficacy in the context of auto-tagging. In this
study, we propose a model that builds on the self-supervised learning approach
to address the similarity-based retrieval challenge by introducing our method
of metric learning with a self-supervised auxiliary loss. Furthermore,
diverging from conventional self-supervised learning methodologies, we
discovered the advantages of concurrently training the model with both
self-supervision and supervision signals, without freezing pre-trained models.
We also found that refraining from employing augmentation during the
fine-tuning phase yields better results. Our experimental results confirm that
the proposed methodology enhances retrieval and tagging performance metrics in
two distinct scenarios: one where human-annotated tags are consistently
available for all music tracks, and another where such tags are accessible only
for a subset of tracks.
Related papers
- Equivariance-based self-supervised learning for audio signal recovery from clipped measurements [13.829249782527363]
We study self-supervised learning for the non-linear inverse problem of recovering audio signals from clipped measurements.
We show that the performance of the proposed equivariance-based self-supervised declipping strategy compares favorably to fully supervised learning.
arXiv Detail & Related papers (2024-09-03T06:12:01Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - Self-Supervised Learning for Audio-Based Emotion Recognition [1.7598252755538808]
Self-supervised learning is a family of methods which can learn despite a scarcity of supervised labels.
We have applied self-supervised learning pre-training to the classification of emotions from the CMU- MOSEI's acoustic modality.
We find that self-supervised learning consistently improves the performance of the model across all metrics.
arXiv Detail & Related papers (2023-07-23T14:40:50Z) - Unsupervised 3D registration through optimization-guided cyclical
self-training [71.75057371518093]
State-of-the-art deep learning-based registration methods employ three different learning strategies.
We propose a novel self-supervised learning paradigm for unsupervised registration, relying on self-training.
We evaluate the method for abdomen and lung registration, consistently surpassing metric-based supervision and outperforming diverse state-of-the-art competitors.
arXiv Detail & Related papers (2023-06-29T14:54:10Z) - Music Instrument Classification Reprogrammed [79.68916470119743]
"Reprogramming" is a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model.
We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters.
arXiv Detail & Related papers (2022-11-15T18:26:01Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Embodied Self-supervised Learning by Coordinated Sampling and Training [14.107020105091662]
We propose a novel self-supervised approach to solve inverse problems by employing the corresponding physical forward process.
The proposed approach works in an analysis-by-synthesis manner to learn an inference network by iteratively sampling and training.
We prove the feasibility of the proposed method by tackling the acoustic-to-articulatory inversion problem to infer articulatory information from speech.
arXiv Detail & Related papers (2020-06-20T14:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.