Improving Label Quality by Jointly Modeling Items and Annotators
- URL: http://arxiv.org/abs/2106.10600v1
- Date: Sun, 20 Jun 2021 02:15:20 GMT
- Title: Improving Label Quality by Jointly Modeling Items and Annotators
- Authors: Tharindu Cyril Weerasooriya, Alexander G. Ororbia, Christopher M.
Homan
- Abstract summary: We propose a fully Bayesian framework for learning ground truth labels from noisy annotators.
Our framework ensures scalability by factoring a generative, Bayesian soft clustering model over label distributions into the classic David and Skene joint annotator-data model.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a fully Bayesian framework for learning ground truth labels from
noisy annotators.
Our framework ensures scalability by factoring a generative, Bayesian soft
clustering model over label distributions into the classic David and Skene
joint annotator-data model. Earlier research along these lines has neither
fully incorporated label distributions nor explored clustering by annotators
only or data only. Our framework incorporates all of these properties as:
(1) a graphical model designed to provide better ground truth estimates of
annotator responses as input to \emph{any} black box supervised learning
algorithm, and
(2) a standalone neural model whose internal structure captures many of the
properties of the graphical model.
We conduct supervised learning experiments using both models and compare them
to the performance of one baseline and a state-of-the-art model.
Related papers
- Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - On the Role of Edge Dependency in Graph Generative Models [28.203109773986167]
We introduce a novel evaluation framework for generative models of graphs.
We focus on the importance of model-generated graph overlap to ensure both accuracy and edge-diversity.
Our results indicate that our simple, interpretable models provide competitive baselines to popular generative models.
arXiv Detail & Related papers (2023-12-06T18:54:27Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Universal Semi-supervised Model Adaptation via Collaborative Consistency
Training [92.52892510093037]
We introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA)
We propose a collaborative consistency training framework that regularizes the prediction consistency between two models.
Experimental results demonstrate the effectiveness of our method on several benchmark datasets.
arXiv Detail & Related papers (2023-07-07T08:19:40Z) - Deep incremental learning models for financial temporal tabular datasets
with distribution shifts [0.9790236766474201]
The framework uses a simple basic building block (decision trees) to build self-similar models of any required complexity.
We demonstrate our scheme using XGBoost models trained on the Numerai dataset and show that a two layer deep ensemble of XGBoost models over different model snapshots delivers high quality predictions.
arXiv Detail & Related papers (2023-03-14T14:10:37Z) - Raw waveform speaker verification for supervised and self-supervised
learning [30.08242210230669]
This paper proposes a new raw waveform speaker verification model that incorporates techniques proven effective for speaker verification.
Under the best performing configuration, the model shows an equal error rate of 0.89%, competitive with state-of-the-art models.
We also explore the proposed model with a self-supervised learning framework and show the state-of-the-art performance in this line of research.
arXiv Detail & Related papers (2022-03-16T09:28:03Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.