Annotation Curricula to Implicitly Train Non-Expert Annotators
- URL: http://arxiv.org/abs/2106.02382v1
- Date: Fri, 4 Jun 2021 09:48:28 GMT
- Title: Annotation Curricula to Implicitly Train Non-Expert Annotators
- Authors: Ji-Ung Lee and Jan-Christoph Klie and Iryna Gurevych
- Abstract summary: voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
- Score: 56.67768938052715
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Annotation studies often require annotators to familiarize themselves with
the task, its annotation scheme, and the data domain. This can be overwhelming
in the beginning, mentally taxing, and induce errors into the resulting
annotations; especially in citizen science or crowd sourcing scenarios where
domain expertise is not required and only annotation guidelines are provided.
To alleviate these issues, we propose annotation curricula, a novel approach to
implicitly train annotators. Our goal is to gradually introduce annotators into
the task by ordering instances that are annotated according to a learning
curriculum. To do so, we first formalize annotation curricula for sentence- and
paragraph-level annotation tasks, define an ordering strategy, and identify
well-performing heuristics and interactively trained models on three existing
English datasets. We then conduct a user study with 40 voluntary participants
who are asked to identify the most fitting misconception for English tweets
about the Covid-19 pandemic. Our results show that using a simple heuristic to
order instances can already significantly reduce the total annotation time
while preserving a high annotation quality. Annotation curricula thus can
provide a novel way to improve data collection. To facilitate future research,
we further share our code and data consisting of 2,400 annotations.
Related papers
- On-the-Fly Point Annotation for Fast Medical Video Labeling [1.890063512530524]
In medical research, deep learning models rely on high-quality annotated data.
The need to adjust two corners makes the process inherently frame-by-frame.
We propose an on-the-fly method for live video annotation to enhance the annotation efficiency.
arXiv Detail & Related papers (2024-04-22T16:59:43Z) - From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web [118.67589717634281]
Continual learning often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice.
We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation.
Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification.
arXiv Detail & Related papers (2023-11-19T10:43:43Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - Extending an Event-type Ontology: Adding Verbs and Classes Using
Fine-tuned LLMs Suggestions [0.0]
We have investigated the use of advanced machine learning methods for pre-annotating data for a lexical extension task.
We have examined the correlation of the automatic scores with the human annotation.
While the correlation turned out to be strong, its influence on the annotation proper is modest due to its near linearity.
arXiv Detail & Related papers (2023-06-03T14:57:47Z) - Annotation Error Detection: Analyzing the Past and Present for a More
Coherent Future [63.99570204416711]
We reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets.
We define a uniform evaluation setup including a new formalization of the annotation error detection task.
We release our datasets and implementations in an easy-to-use and open source software package.
arXiv Detail & Related papers (2022-06-05T22:31:45Z) - Assisted Text Annotation Using Active Learning to Achieve High Quality
with Little Effort [9.379650501033465]
We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations.
We combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories.
Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even complex and subtle frames.
arXiv Detail & Related papers (2021-12-15T13:14:58Z) - Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks [17.033055327465238]
We propose two contrasting paradigms for data annotation.
The descriptive paradigm encourages annotator subjectivity, whereas the prescriptive paradigm discourages it.
We argue that dataset creators should explicitly aim for one or the other to facilitate the intended use of their dataset.
arXiv Detail & Related papers (2021-12-14T15:38:22Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - Active Learning for Coreference Resolution using Discrete Annotation [76.36423696634584]
We improve upon pairwise annotation for active learning in coreference resolution.
We ask annotators to identify mention antecedents if a presented mention pair is deemed not coreferent.
In experiments with existing benchmark coreference datasets, we show that the signal from this additional question leads to significant performance gains per human-annotation hour.
arXiv Detail & Related papers (2020-04-28T17:17:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.