Related papers: Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts

URL: http://arxiv.org/abs/2505.17222v1
Date: Thu, 22 May 2025 18:55:22 GMT
Title: Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts
Authors: Georgios Chochlakis, Peter Wu, Arjun Bedi, Marcus Ma, Kristina Lerman, Shrikanth Narayanan,
Abstract summary: We explore label verification in contexts using Large Language Models (LLMs)<n>We propose the Label-in-a-Haystack Rectification (LiaHR) framework for subjective label correction.<n>This approach can be integrated into annotation pipelines to enhance signal-to-noise ratios.
Score: 26.415262737856967
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modeling complex subjective tasks in Natural Language Processing, such as recognizing emotion and morality, is considerably challenging due to significant variation in human annotations. This variation often reflects reasonable differences in semantic interpretations rather than mere noise, necessitating methods to distinguish between legitimate subjectivity and error. We address this challenge by exploring label verification in these contexts using Large Language Models (LLMs). First, we propose a simple In-Context Learning binary filtering baseline that estimates the reasonableness of a document-label pair. We then introduce the Label-in-a-Haystack setting: the query and its label(s) are included in the demonstrations shown to LLMs, which are prompted to predict the label(s) again, while receiving task-specific instructions (e.g., emotion recognition) rather than label copying. We show how the failure to copy the label(s) to the output of the LLM are task-relevant and informative. Building on this, we propose the Label-in-a-Haystack Rectification (LiaHR) framework for subjective label correction: when the model outputs diverge from the reference gold labels, we assign the generated labels to the example instead of discarding it. This approach can be integrated into annotation pipelines to enhance signal-to-noise ratios. Comprehensive analyses, human evaluations, and ecological validity studies verify the utility of LiaHR for label correction. Code is available at https://github.com/gchochla/LiaHR.

Related papers

Diffusion Disambiguation Models for Partial Label Learning [7.869159280247732]
Learning from ambiguous labels is a long-standing problem in practical machine learning applications.<n>Inspired by the remarkable performance of diffusion models in various generation tasks, this paper explores their potential to denoise ambiguous labels.<n>This paper proposes a emphdiffusion disambiguation model for (DDMP), which first uses the potential complementary information between instances and labels.
arXiv Detail & Related papers (2025-07-01T03:53:45Z)
Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification [11.19022605804112]
This paper introduces RR2QC, a novel Retrieval Reranking method to multi-label Question Classification by leveraging label semantics and meta-label refinement.<n> Experimental results show that RR2QC outperforms existing methods in Precision@K and F1 scores across multiple educational datasets.
arXiv Detail & Related papers (2024-11-04T06:27:14Z)
Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels [7.396461226948109]
We address the limitations of the common data annotation and training methods for objective single-label classification tasks. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels.
arXiv Detail & Related papers (2023-11-09T10:47:39Z)
Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation. We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
BERT-Assisted Semantic Annotation Correction for Emotion-Related Questions [0.0]
We use the BERT neural language model to feed information back into an annotation task in a question-asking game called Emotion Twenty Questions (EMO20Q) We show this method to be an effective way to assess and revise annotations of textual user data with complex, utterance-level semantic labels.
arXiv Detail & Related papers (2022-04-02T18:00:49Z)
Acknowledging the Unknown for Multi-label Learning with Single Positive Labels [65.5889334964149]
Traditionally, all unannotated labels are assumed as negative labels in single positive multi-label learning (SPML) We propose entropy-maximization (EM) loss to maximize the entropy of predicted probabilities for all unannotated labels. Considering the positive-negative label imbalance of unannotated labels, we propose asymmetric pseudo-labeling (APL) with asymmetric-tolerance strategies and a self-paced procedure to provide more precise supervision.
arXiv Detail & Related papers (2022-03-30T11:43:59Z)
A Label Dependence-aware Sequence Generation Model for Multi-level Implicit Discourse Relation Recognition [31.179555215952306]
Implicit discourse relation recognition is a challenging but crucial task in discourse analysis. We propose a Label Dependence-aware Sequence Generation Model (LDSGM) for it. We develop a mutual learning enhanced training method to exploit the label dependence in a bottomup direction.
arXiv Detail & Related papers (2021-12-22T09:14:03Z)
Instance-Dependent Partial Label Learning [69.49681837908511]
Partial label learning is a typical weakly supervised learning problem. Most existing approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels. In this paper, we consider instance-dependent and assume that each example is associated with a latent label distribution constituted by the real number of each label.
arXiv Detail & Related papers (2021-10-25T12:50:26Z)
Exploiting Context for Robustness to Label Noise in Active Learning [47.341705184013804]
We address the problems of how a system can identify which of the queried labels are wrong and how a multi-class active learning system can be adapted to minimize the negative impact of label noise. We construct a graphical representation of the unlabeled data to encode these relationships and obtain new beliefs on the graph when noisy labels are available. This is demonstrated in three different applications: scene classification, activity classification, and document classification.
arXiv Detail & Related papers (2020-10-18T18:59:44Z)
Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network [61.94394163309688]
We propose a Label-enhanced Task-Adaptive Projection Network (L-TapNet) based on the state-of-the-art few-shot classification model -- TapNet. Experimental results show that our model significantly outperforms the strongest few-shot learning baseline by 14.64 F1 scores in the one-shot setting.
arXiv Detail & Related papers (2020-06-10T07:50:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.