Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation
- URL: http://arxiv.org/abs/2506.23371v1
- Date: Sun, 29 Jun 2025 19:10:51 GMT
- Title: Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation
- Authors: Frank Cwitkowitz, Zhiyao Duan,
- Abstract summary: Multi-Pitch Estimation (MPE) continues to be a sought after capability of Music Information Retrieval (MIR) systems.<n>We extend the classic supervised MPE paradigm by incorporating several self-supervised objectives.<n>We uncover a phenomenon whereby our model simultaneously overfits to the supervised data while degenerating on data used for self-supervision only.
- Score: 17.776058135307363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-Pitch Estimation (MPE) continues to be a sought after capability of Music Information Retrieval (MIR) systems, and is critical for many applications and downstream tasks involving pitch, including music transcription. However, existing methods are largely based on supervised learning, and there are significant challenges in collecting annotated data for the task. Recently, self-supervised techniques exploiting intrinsic properties of pitch and harmonic signals have shown promise for both monophonic and polyphonic pitch estimation, but these still remain inferior to supervised methods. In this work, we extend the classic supervised MPE paradigm by incorporating several self-supervised objectives based on pitch-invariant and pitch-equivariant properties. This joint training results in a substantial improvement under closed training conditions, which naturally suggests that applying the same objectives to a broader collection of data will yield further improvements. However, in doing so we uncover a phenomenon whereby our model simultaneously overfits to the supervised data while degenerating on data used for self-supervision only. We demonstrate and investigate this and offer our insights on the underlying problem.
Related papers
- Stochastic Encodings for Active Feature Acquisition [100.47043816019888]
Active Feature Acquisition is an instance-wise, sequential decision making problem.<n>The aim is to dynamically select which feature to measure based on current observations, independently for each test instance.<n>Common approaches either use Reinforcement Learning, which experiences training difficulties, or greedily maximize the conditional mutual information of the label and unobserved features, which makes myopic.<n>We introduce a latent variable model, trained in a supervised manner. Acquisitions are made by reasoning about the features across many possible unobserved realizations in a latent space.
arXiv Detail & Related papers (2025-08-03T23:48:46Z) - Self-Controlled Dynamic Expansion Model for Continual Learning [10.447232167638816]
This paper introduces an innovative Self-Controlled Dynamic Expansion Model (SCDEM)<n>SCDEM orchestrates multiple trainable pre-trained ViT backbones to furnish diverse and semantically enriched representations.<n>An extensive series of experiments have been conducted to evaluate the proposed methodology's efficacy.
arXiv Detail & Related papers (2025-04-14T15:22:51Z) - Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models [10.449015816015566]
Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference.<n>We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we formalize as the generation-verification gap.<n>We also examine when self-improvement is possible, an iterative self-improvement procedure, and ways to improve its performance.
arXiv Detail & Related papers (2024-12-03T18:47:26Z) - ReAugment: Model Zoo-Guided RL for Few-Shot Time Series Augmentation and Forecasting [74.00765474305288]
We present a pilot study on using reinforcement learning (RL) for time series data augmentation.<n>Our method, ReAugment, tackles three critical questions: which parts of the training set should be augmented, how the augmentation should be performed, and what advantages RL brings to the process.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [104.60508550106618]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)<n>We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.<n>We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Beyond Just Vision: A Review on Self-Supervised Representation Learning
on Multimodal and Temporal Data [10.006890915441987]
Popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training.
Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models.
We aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data.
arXiv Detail & Related papers (2022-06-06T04:59:44Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Robust Disentanglement of a Few Factors at a Time [5.156484100374058]
We introduce population-based training (PBT) for improving consistency in training variational autoencoders (VAEs)
We then use Unsupervised Disentanglement Ranking (UDR) as an unsupervised to score models in our PBT-VAE training and show how models trained this way tend to consistently disentangle only a subset of the generative factors.
We show striking improvement in state-of-the-art unsupervised disentanglement performance and robustness across multiple datasets and metrics.
arXiv Detail & Related papers (2020-10-26T12:34:23Z) - Active Feature Acquisition with Generative Surrogate Models [11.655069211977464]
In this work, we consider models that perform active feature acquisition (AFA) and query the environment for unobserved features.
Our work reformulates the Markov decision process (MDP) that underlies the AFA problem as a generative modeling task.
We propose learning a generative surrogate model ( GSM) that captures the dependencies among input features to assess potential information gain from acquisitions.
arXiv Detail & Related papers (2020-10-06T02:10:06Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.