A Mamba-based Network for Semi-supervised Singing Melody Extraction Using Confidence Binary Regularization
- URL: http://arxiv.org/abs/2505.08681v1
- Date: Tue, 13 May 2025 15:43:35 GMT
- Title: A Mamba-based Network for Semi-supervised Singing Melody Extraction Using Confidence Binary Regularization
- Authors: Xiaoliang He, Kangjie Dong, Jingkai Cao, Shuai Yu, Wei Li, Yi Yu,
- Abstract summary: Singing melody extraction is a key task in the field of music information retrieval.<n>Existing methods are facing several limitations.<n>We propose a mamba-based network, called SpectMamba, for semi-supervised singing melody extraction.
- Score: 14.501400507234356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Singing melody extraction (SME) is a key task in the field of music information retrieval. However, existing methods are facing several limitations: firstly, prior models use transformers to capture the contextual dependencies, which requires quadratic computation resulting in low efficiency in the inference stage. Secondly, prior works typically rely on frequencysupervised methods to estimate the fundamental frequency (f0), which ignores that the musical performance is actually based on notes. Thirdly, transformers typically require large amounts of labeled data to achieve optimal performances, but the SME task lacks of sufficient annotated data. To address these issues, in this paper, we propose a mamba-based network, called SpectMamba, for semi-supervised singing melody extraction using confidence binary regularization. In particular, we begin by introducing vision mamba to achieve computational linear complexity. Then, we propose a novel note-f0 decoder that allows the model to better mimic the musical performance. Further, to alleviate the scarcity of the labeled data, we introduce a confidence binary regularization (CBR) module to leverage the unlabeled data by maximizing the probability of the correct classes. The proposed method is evaluated on several public datasets and the conducted experiments demonstrate the effectiveness of our proposed method.
Related papers
- M2Rec: Multi-scale Mamba for Efficient Sequential Recommendation [35.508076394809784]
model is a novel sequential recommendation framework that integrates multi-scale Mamba with Fourier analysis, Large Language Models, and adaptive gating.<n>Experiments demonstrate that model achieves state-of-the-art performance, improving Hit Rate@10 by 3.2% over existing Mamba-based models.
arXiv Detail & Related papers (2025-05-07T14:14:29Z) - DiffImpute: Tabular Data Imputation With Denoising Diffusion Probabilistic Model [9.908561639396273]
We propose DiffImpute, a novel Denoising Diffusion Probabilistic Model (DDPM)
It produces credible imputations for missing entries without undermining the authenticity of the existing data.
It can be applied to various settings of Missing Completely At Random (MCAR) and Missing At Random (MAR)
arXiv Detail & Related papers (2024-03-20T08:45:31Z) - Federated Learning with Instance-Dependent Noisy Label [6.093214616626228]
FedBeat aims to build a global statistically consistent classifier using the IDN transition matrix (IDNTM)
Experiments conducted on CIFAR-10 and SVHN verify that the proposed method significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-16T05:08:02Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - Mutual-Information Based Few-Shot Classification [34.95314059362982]
We introduce Transductive Infomation Maximization (TIM) for few-shot learning.
Our method maximizes the mutual information between the query features and their label predictions for a given few-shot task.
We propose a new alternating-direction solver, which speeds up transductive inference over gradient-based optimization.
arXiv Detail & Related papers (2021-06-23T09:17:23Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Meta Transition Adaptation for Robust Deep Learning with Noisy Labels [61.8970957519509]
This study proposes a new meta-transition-learning strategy for the task.
Specifically, through the sound guidance of a small set of meta data with clean labels, the noise transition matrix and the classifier parameters can be mutually ameliorated.
Our method can more accurately extract the transition matrix, naturally following its more robust performance than prior arts.
arXiv Detail & Related papers (2020-06-10T07:27:25Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.