Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model
- URL: http://arxiv.org/abs/2405.12081v2
- Date: Sun, 22 Sep 2024 11:18:59 GMT
- Title: Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model
- Authors: Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan,
- Abstract summary: We propose a selective annotation framework called SANT.
It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms.
Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers.
- Score: 42.70608373297776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies.
Related papers
- Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - From Random to Informed Data Selection: A Diversity-Based Approach to
Optimize Human Annotation and Few-Shot Learning [38.30983556062276]
A major challenge in Natural Language Processing is obtaining annotated data for supervised learning.
Crowdsourcing introduces issues related to the annotator's experience, consistency, and biases.
This paper contributes an automatic and informed data selection architecture to build a small dataset for few-shot learning.
arXiv Detail & Related papers (2024-01-24T04:57:32Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.
XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.
Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - GPT Self-Supervision for a Better Data Annotator [22.598300095822026]
We propose a Generative Pretrained Transformer (GPT) self-supervision annotation method.
The proposed approach comprises a one-shot tuning phase followed by a generation phase.
The alignment score between the recovered and original data serves as a self-supervision navigator to refine the process.
arXiv Detail & Related papers (2023-06-07T11:33:14Z) - Full or Weak annotations? An adaptive strategy for budget-constrained
annotation campaigns [3.1318537187387787]
We propose a novel approach to determine annotation strategies for segmentation datasets.
Our method sequentially determines proportions of segmentation and classification annotations to collect for budget-fractions.
We show in our experiments that our approach yields annotations that perform very close to the optimal for a number of different annotation budgets and datasets.
arXiv Detail & Related papers (2023-03-21T08:41:54Z) - Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation [107.72926721837726]
coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models.
We propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of coarsely annotated data.
Our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget.
arXiv Detail & Related papers (2022-12-15T15:43:42Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Re-Examining Human Annotations for Interpretable NLP [80.81532239566992]
We conduct controlled experiments using crowd-sourced websites on two widely used datasets in Interpretable NLP.
We compare the annotation results obtained from recruiting workers satisfying different levels of qualification.
Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions.
arXiv Detail & Related papers (2022-04-10T02:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.