Disjoint Contrastive Regression Learning for Multi-Sourced Annotations
- URL: http://arxiv.org/abs/2112.15411v2
- Date: Tue, 12 Mar 2024 16:16:00 GMT
- Title: Disjoint Contrastive Regression Learning for Multi-Sourced Annotations
- Authors: Xiaoqian Ruan, Gaoang Wang
- Abstract summary: Large-scale datasets are important for the development of deep learning models.
Multiple annotators may be employed to label different subsets of the data.
The inconsistency and bias among different annotators are harmful to the model training.
- Score: 10.159313152511919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale datasets are important for the development of deep learning
models. Such datasets usually require a heavy workload of annotations, which
are extremely time-consuming and expensive. To accelerate the annotation
procedure, multiple annotators may be employed to label different subsets of
the data. However, the inconsistency and bias among different annotators are
harmful to the model training, especially for qualitative and subjective
tasks.To address this challenge, in this paper, we propose a novel contrastive
regression framework to address the disjoint annotations problem, where each
sample is labeled by only one annotator and multiple annotators work on
disjoint subsets of the data. To take account of both the intra-annotator
consistency and inter-annotator inconsistency, two strategies are
employed.Firstly, a contrastive-based loss is applied to learn the relative
ranking among different samples of the same annotator, with the assumption that
the ranking of samples from the same annotator is unanimous. Secondly, we apply
the gradient reversal layer to learn robust representations that are invariant
to different annotators. Experiments on the facial expression prediction task,
as well as the image quality assessment task, verify the effectiveness of our
proposed framework.
Related papers
- Memory Consistency Guided Divide-and-Conquer Learning for Generalized
Category Discovery [56.172872410834664]
Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning.
We propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL)
Our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition.
arXiv Detail & Related papers (2024-01-24T09:39:45Z) - Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks [9.110872603799839]
Supervised classification heavily depends on datasets annotated by humans.
In subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters.
In this work, we propose textbfAnnotator Awares for Texts (AART) for subjective classification tasks.
arXiv Detail & Related papers (2023-11-16T10:18:32Z) - ACTOR: Active Learning with Annotator-specific Classification Heads to
Embrace Human Label Variation [35.10805667891489]
Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement.
We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation.
arXiv Detail & Related papers (2023-10-23T14:26:43Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Multi-annotator Deep Learning: A Probabilistic Framework for
Classification [2.445702550853822]
Training standard deep neural networks leads to subpar performances in multi-annotator supervised learning settings.
We address this issue by presenting a probabilistic training framework named multi-annotator deep learning (MaDL)
A modular network architecture enables us to make varying assumptions regarding annotators' performances.
Our findings show MaDL's state-of-the-art performance and robustness against many correlated, spamming annotators.
arXiv Detail & Related papers (2023-04-05T16:00:42Z) - Dealing with Disagreements: Looking Beyond the Majority Vote in
Subjective Annotations [6.546195629698355]
We investigate the efficacy of multi-annotator models for subjective tasks.
We show that this approach yields same or better performance than aggregating labels in the data prior to training.
Our approach also provides a way to estimate uncertainty in predictions, which we demonstrate better correlate with annotation disagreements than traditional methods.
arXiv Detail & Related papers (2021-10-12T03:12:34Z) - Instance-Level Relative Saliency Ranking with Graph Reasoning [126.09138829920627]
We present a novel unified model to segment salient instances and infer relative saliency rank order.
A novel loss function is also proposed to effectively train the saliency ranking branch.
experimental results demonstrate that our proposed model is more effective than previous methods.
arXiv Detail & Related papers (2021-07-08T13:10:42Z) - Hard-label Manifolds: Unexpected Advantages of Query Efficiency for
Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives.
It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors.
We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z) - Single versus Multiple Annotation for Named Entity Recognition of
Mutations [4.213427823201119]
We address the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required.
Once we evaluate the performance loss when using a single annotator, we apply different methods to sample the training data for second annotation.
We use held-out double-annotated data to build two scenarios with different types of rankings: similarity-based and confidence based.
We evaluate both approaches on: (i) their ability to identify training instances that are erroneous, and (ii) on Mutation NER performance for state-of-the-art
arXiv Detail & Related papers (2021-01-19T03:54:17Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.