Leveraging Annotator Disagreement for Text Classification
- URL: http://arxiv.org/abs/2409.17577v1
- Date: Thu, 26 Sep 2024 06:46:53 GMT
- Title: Leveraging Annotator Disagreement for Text Classification
- Authors: Jin Xu, Mariƫt Theune, Daniel Braun,
- Abstract summary: It is common practice in text classification to only use one majority label for model training even if a dataset has been annotated by multiple annotators.
This paper proposes three strategies to leverage annotator disagreement for text classification: a probability-based multi-label method, an ensemble system, and instruction tuning.
- Score: 3.6625157427847963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is common practice in text classification to only use one majority label for model training even if a dataset has been annotated by multiple annotators. Doing so can remove valuable nuances and diverse perspectives inherent in the annotators' assessments. This paper proposes and compares three different strategies to leverage annotator disagreement for text classification: a probability-based multi-label method, an ensemble system, and instruction tuning. All three approaches are evaluated on the tasks of hate speech and abusive conversation detection, which inherently entail a high degree of subjectivity. Moreover, to evaluate the effectiveness of embracing annotation disagreements for model training, we conduct an online survey that compares the performance of the multi-label model against a baseline model, which is trained with the majority label. The results show that in hate speech detection, the multi-label method outperforms the other two approaches, while in abusive conversation detection, instruction tuning achieves the best performance. The results of the survey also show that the outputs from the multi-label models are considered a better representation of the texts than the single-label model.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Capturing Perspectives of Crowdsourced Annotators in Subjective Learning Tasks [9.110872603799839]
Supervised classification heavily depends on datasets annotated by humans.
In subjective tasks such as toxicity classification, these annotations often exhibit low agreement among raters.
In this work, we propose textbfAnnotator Awares for Texts (AART) for subjective classification tasks.
arXiv Detail & Related papers (2023-11-16T10:18:32Z) - AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to
Improve Hate Speech Detection [18.823219608659986]
AnnoBERT is a first-of-its-kind architecture integrating annotator characteristics and label text to detect hate speech.
During training, the model associates annotators with their label choices given a piece of text.
During evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators.
arXiv Detail & Related papers (2022-12-20T16:30:11Z) - Text2Model: Text-based Model Induction for Zero-shot Image Classification [38.704831945753284]
We address the challenge of building task-agnostic classifiers using only text descriptions.
We generate zero-shot classifiers using a hypernetwork that receives class descriptions and outputs a multi-class model.
We evaluate this approach in a series of zero-shot classification tasks, for image, point-cloud, and action recognition, using a range of text descriptions.
arXiv Detail & Related papers (2022-10-27T05:19:55Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - Are We Really Making Much Progress in Text Classification? A Comparative
Review [2.579878570919875]
This study reviews and compares methods for single-label and multi-label text classification.
Results reveal that all recently proposed graph-based and hierarchy-based methods fail to outperform pre-trained language models.
arXiv Detail & Related papers (2022-04-08T09:28:20Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - Not All Negatives are Equal: Label-Aware Contrastive Loss for
Fine-grained Text Classification [0.0]
We analyse the contrastive fine-tuning of pre-trained language models on two fine-grained text classification tasks.
We adaptively embed class relationships into a contrastive objective function to help differently weigh the positives and negatives.
We find that Label-aware Contrastive Loss outperforms previous contrastive methods.
arXiv Detail & Related papers (2021-09-12T04:19:17Z) - Few-shot Learning for Multi-label Intent Detection [59.66787898744991]
State-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels.
Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.
arXiv Detail & Related papers (2020-10-11T14:42:18Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.