Data Cleansing with Contrastive Learning for Vocal Note Event
Annotations
- URL: http://arxiv.org/abs/2008.02069v3
- Date: Tue, 27 Apr 2021 10:19:17 GMT
- Title: Data Cleansing with Contrastive Learning for Vocal Note Event
Annotations
- Authors: Gabriel Meseguer-Brocal, Rachel Bittner, Simon Durand and Brian Brost
- Abstract summary: We propose a novel data cleansing model for time-varying, structured labels.
Our model is trained in a contrastive learning manner by automatically creating local deformations of likely correct labels.
We demonstrate that the accuracy of a transcription model improves greatly when trained using our proposed strategy.
- Score: 1.859931123372708
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data cleansing is a well studied strategy for cleaning erroneous labels in
datasets, which has not yet been widely adopted in Music Information Retrieval.
Previously proposed data cleansing models do not consider structured (e.g. time
varying) labels, such as those common to music data. We propose a novel data
cleansing model for time-varying, structured labels which exploits the local
structure of the labels, and demonstrate its usefulness for vocal note event
annotations in music. %Our model is trained in a contrastive learning manner by
automatically creating local deformations of likely correct labels. Our model
is trained in a contrastive learning manner by automatically contrasting likely
correct labels pairs against local deformations of them. We demonstrate that
the accuracy of a transcription model improves greatly when trained using our
proposed strategy compared with the accuracy when trained using the original
dataset. Additionally we use our model to estimate the annotation error rates
in the DALI dataset, and highlight other potential uses for this type of model.
Related papers
- Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Automated Labeling of German Chest X-Ray Radiology Reports using Deep
Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model.
Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z) - Learning with Noisy Labels by Adaptive Gradient-Based Outlier Removal [4.71154003227418]
We propose AGRA: a new method for learning with noisy labels by using Adaptive GRAdient-based outlier removal.
By comparing the aggregated gradient of a batch of samples and an individual example gradient, our method dynamically decides whether a corresponding example is helpful for the model.
Extensive evaluation on several datasets demonstrates AGRA's effectiveness.
arXiv Detail & Related papers (2023-06-07T15:10:01Z) - Learning to Detect Noisy Labels Using Model-Based Features [16.681748918518075]
We propose Selection-Enhanced Noisy label Training (SENT)
SENT does not rely on meta learning while having the flexibility of being data-driven.
It improves performance over strong baselines under the settings of self-training and label corruption.
arXiv Detail & Related papers (2022-12-28T10:12:13Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Active label cleaning: Improving dataset quality under resource
constraints [13.716577886649018]
Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models.
This work advocates for a data-driven approach to prioritising samples for re-annotation.
We rank instances according to estimated label correctness and labelling difficulty of each sample, and introduce a simulation framework to evaluate relabelling efficacy.
arXiv Detail & Related papers (2021-09-01T19:03:57Z) - Instance Correction for Learning with Open-set Noisy Labels [145.06552420999986]
We use the sample selection approach to handle open-set noisy labels.
The discarded data are seen to be mislabeled and do not participate in training.
We modify the instances of discarded data to make predictions for the discarded data consistent with given labels.
arXiv Detail & Related papers (2021-06-01T13:05:55Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z) - Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset
Augmentation Using Graph Theory [21.06607915149245]
We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property.
We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.
arXiv Detail & Related papers (2020-11-03T17:18:03Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.