Measuring Annotator Agreement Generally across Complex Structured,
Multi-object, and Free-text Annotation Tasks
- URL: http://arxiv.org/abs/2212.09503v1
- Date: Thu, 15 Dec 2022 20:12:48 GMT
- Title: Measuring Annotator Agreement Generally across Complex Structured,
Multi-object, and Free-text Annotation Tasks
- Authors: Alexander Braylan, Omar Alonso, Matthew Lease
- Abstract summary: Inter-annotator agreement (IAA) is a key metric for quality assurance.
Measures exist for simple categorical and ordinal labeling tasks, but little work has considered more complex labeling tasks.
Krippendorff's alpha, best known for use with simpler labeling tasks, does have a distance-based formulation with broader applicability.
- Score: 79.24863171717972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When annotators label data, a key metric for quality assurance is
inter-annotator agreement (IAA): the extent to which annotators agree on their
labels. Though many IAA measures exist for simple categorical and ordinal
labeling tasks, relatively little work has considered more complex labeling
tasks, such as structured, multi-object, and free-text annotations.
Krippendorff's alpha, best known for use with simpler labeling tasks, does have
a distance-based formulation with broader applicability, but little work has
studied its efficacy and consistency across complex annotation tasks.
We investigate the design and evaluation of IAA measures for complex
annotation tasks, with evaluation spanning seven diverse tasks: image bounding
boxes, image keypoints, text sequence tagging, ranked lists, free text
translations, numeric vectors, and syntax trees. We identify the difficulty of
interpretability and the complexity of choosing a distance function as key
obstacles in applying Krippendorff's alpha generally across these tasks. We
propose two novel, more interpretable measures, showing they yield more
consistent IAA measures across tasks and annotation distance functions.
Related papers
- Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - A General Model for Aggregating Annotations Across Simple, Complex, and
Multi-Object Annotation Tasks [51.14185612418977]
A strategy to improve label quality is to ask multiple annotators to label the same item and aggregate their labels.
While a variety of bespoke models have been proposed for specific tasks, our work is the first to introduce aggregation methods that generalize across many diverse complex tasks.
This article extends our prior work with investigation of three new research questions.
arXiv Detail & Related papers (2023-12-20T21:28:35Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - PIZZA: A new benchmark for complex end-to-end task-oriented parsing [3.5106870325869886]
This paper introduces a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.
We perform an evaluation of deep-learning techniques for task-oriented parsing on this dataset, including different flavors of seq2seqNGs.
arXiv Detail & Related papers (2022-12-01T04:20:07Z) - Frustratingly Easy Label Projection for Cross-lingual Transfer [25.398772204761215]
A few efforts have utilized a simple mark-then-translate method to jointly perform translation and projection.
We present an empirical study across 57 languages and three tasks (QA, NER, and Event Extraction) to evaluate the effectiveness and limitations of both methods.
Our optimized version of mark-then-translate, which we call EasyProject, is easily applied to many languages and works surprisingly well, outperforming the more complex word alignment-based methods.
arXiv Detail & Related papers (2022-11-28T18:11:48Z) - Pseudo-Labels Are All You Need [3.52359746858894]
We present our submission to the Text Complexity DE Challenge 2022.
The goal is to predict the complexity of a German sentence for German learners at level B.
We find that the pseudo-label-based approach gives impressive results yet requires little to no adjustment to the specific task.
arXiv Detail & Related papers (2022-08-19T09:52:41Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.