Related papers: A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks

A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks

URL: http://arxiv.org/abs/2312.13437v1
Date: Wed, 20 Dec 2023 21:28:35 GMT
Title: A General Model for Aggregating Annotations Across Simple, Complex, and Multi-Object Annotation Tasks
Authors: Alexander Braylan, Madalyn Marabella, Omar Alonso, Matthew Lease
Abstract summary: A strategy to improve label quality is to ask multiple annotators to label the same item and aggregate their labels. While a variety of bespoke models have been proposed for specific tasks, our work is the first to introduce aggregation methods that generalize across many diverse complex tasks. This article extends our prior work with investigation of three new research questions.
Score: 51.14185612418977
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human annotations are vital to supervised learning, yet annotators often disagree on the correct label, especially as annotation tasks increase in complexity. A strategy to improve label quality is to ask multiple annotators to label the same item and aggregate their labels. Many aggregation models have been proposed for categorical or numerical annotation tasks, but far less work has considered more complex annotation tasks involving open-ended, multivariate, or structured responses. While a variety of bespoke models have been proposed for specific tasks, our work is the first to introduce aggregation methods that generalize across many diverse complex tasks, including sequence labeling, translation, syntactic parsing, ranking, bounding boxes, and keypoints. This generality is achieved by devising a task-agnostic method to model distances between labels rather than the labels themselves. This article extends our prior work with investigation of three new research questions. First, how do complex annotation properties impact aggregation accuracy? Second, how should a task owner navigate the many modeling choices to maximize aggregation accuracy? Finally, what diagnoses can verify that aggregation models are specified correctly for the given data? To understand how various factors impact accuracy and to inform model selection, we conduct simulation studies and experiments on real, complex datasets. Regarding testing, we introduce unit tests for aggregation models and present a suite of such tests to ensure that a given model is not mis-specified and exhibits expected behavior. Beyond investigating these research questions above, we discuss the foundational concept of annotation complexity, present a new aggregation model as a bridge between traditional models and our own, and contribute a new semi-supervised learning method for complex label aggregation that outperforms prior work.

Related papers

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
QuMAB: Query-based Multi-Annotator Behavior Modeling with Reliability under Sparse Labels [23.555446749682467]
Multi-annotator learning traditionally aggregates diverse annotations to approximate a single ground truth, treating disagreements as noise.<n>We introduce a paradigm shift from sample-wise aggregation to annotator-wise behavior modeling.<n>By treating annotator disagreements as valuable information rather than noise, modeling annotator-specific behavior patterns can reconstruct unlabeled data to reduce annotation cost, enhance aggregation reliability, and explain annotator decision behavior.
arXiv Detail & Related papers (2025-07-23T16:17:43Z)
Boosting LLM via Learning from Data Iteratively and Selectively [23.30222913893128]
We measure the quality of a sample from complexity and diversity simultaneously. IterIT integrates the strengths of both worlds by iteratively updating the complexity score for the top-ranked samples.
arXiv Detail & Related papers (2024-12-23T08:01:24Z)
LaSagnA: Language-based Segmentation Assistant for Complex Queries [39.620806493454616]
Large Language Models for Vision (vLLMs) generate detailed perceptual outcomes, including bounding boxes and masks. In this study, we acknowledge that the main cause of these problems is the insufficient complexity of training queries. We present three novel strategies to effectively handle the challenges arising from the direct integration of the proposed format.
arXiv Detail & Related papers (2024-04-12T14:40:45Z)
Meta Operator for Complex Query Answering on Knowledge Graphs [58.340159346749964]
We argue that different logical operator types, rather than the different complex query types, are the key to improving generalizability. We propose a meta-learning algorithm to learn the meta-operators with limited data and adapt them to different instances of operators under various complex queries. Empirical results show that learning meta-operators is more effective than learning original CQA or meta-CQA models.
arXiv Detail & Related papers (2024-03-15T08:54:25Z)
ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation [35.10805667891489]
Active learning, as an annotation cost-saving strategy, has not been fully explored in the context of learning from disagreement. We show that in the active learning setting, a multi-head model performs significantly better than a single-head model in terms of uncertainty estimation.
arXiv Detail & Related papers (2023-10-23T14:26:43Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Association Graph Learning for Multi-Task Classification with Category Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. We learn an association graph to transfer knowledge among tasks for missing classes. Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z)
Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning [54.66399120084227]
Recent state-of-the-art neural text matching models (PLMs) are hard to generalize to different tasks. We adopt a specialization-generalization training strategy and refer to it as Match-Prompt. In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens. In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks.
arXiv Detail & Related papers (2022-04-06T11:01:08Z)
Fuzzy Simplicial Networks: A Topology-Inspired Model to Improve Task Generalization in Few-shot Learning [1.0062040918634414]
Few-shot learning algorithms are designed to generalize well to new tasks with limited data. We introduce a new few-shot model called Fuzzy Simplicial Networks (FSN) which leverages a construction from topology to more flexibly represent each class from limited data.
arXiv Detail & Related papers (2020-09-23T17:01:09Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
Joint Multi-Dimensional Model for Global and Time-Series Annotations [48.159050222769494]
Crowdsourcing is a popular approach to collect annotations for unlabeled data instances. It involves collecting a large number of annotations from several, often naive untrained annotators for each data instance which are then combined to estimate the ground truth. Most annotation fusion schemes however ignore this aspect and model each dimension separately. We propose a generative model for multi-dimensional annotation fusion, which models the dimensions jointly leading to more accurate ground truth estimates.
arXiv Detail & Related papers (2020-05-06T20:08:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.