Ground-Truth, Whose Truth? -- Examining the Challenges with Annotating
Toxic Text Datasets
- URL: http://arxiv.org/abs/2112.03529v1
- Date: Tue, 7 Dec 2021 06:58:22 GMT
- Title: Ground-Truth, Whose Truth? -- Examining the Challenges with Annotating
Toxic Text Datasets
- Authors: Kofi Arhin, Ioana Baldini, Dennis Wei, Karthikeyan Natesan Ramamurthy,
Moninder Singh
- Abstract summary: This study examines selected toxic text datasets with the goal of shedding light on some of the inherent issues.
We re-annotate samples from three toxic text datasets and find that a multi-label approach to annotating toxic text samples can help to improve dataset quality.
- Score: 26.486492641924226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of machine learning (ML)-based language models (LMs) to monitor
content online is on the rise. For toxic text identification, task-specific
fine-tuning of these models are performed using datasets labeled by annotators
who provide ground-truth labels in an effort to distinguish between offensive
and normal content. These projects have led to the development, improvement,
and expansion of large datasets over time, and have contributed immensely to
research on natural language. Despite the achievements, existing evidence
suggests that ML models built on these datasets do not always result in
desirable outcomes. Therefore, using a design science research (DSR) approach,
this study examines selected toxic text datasets with the goal of shedding
light on some of the inherent issues and contributing to discussions on
navigating these challenges for existing and future projects. To achieve the
goal of the study, we re-annotate samples from three toxic text datasets and
find that a multi-label approach to annotating toxic text samples can help to
improve dataset quality. While this approach may not improve the traditional
metric of inter-annotator agreement, it may better capture dependence on
context and diversity in annotators. We discuss the implications of these
results for both theory and practice.
Related papers
- Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.
We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement.
We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z) - Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey [17.19337964440007]
There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain.
This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized.
It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
arXiv Detail & Related papers (2024-02-27T23:59:01Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Data Distribution Bottlenecks in Grounding Language Models to Knowledge
Bases [9.610231090476857]
Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language.
This paper is an experimental investigation aimed at uncovering the challenges that LMs encounter when tasked with knowledge base question answering (KBQA)
Our experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions.
arXiv Detail & Related papers (2023-09-15T12:06:45Z) - Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study
on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT.
To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z) - DATED: Guidelines for Creating Synthetic Datasets for Engineering Design
Applications [3.463438487417909]
This study proposes comprehensive guidelines for generating, annotating, and validating synthetic datasets.
The study underscores the importance of thoughtful sampling methods to ensure the appropriate size, diversity, utility, and realism of a dataset.
Overall, this paper offers valuable insights for researchers intending to create and publish synthetic datasets for engineering design.
arXiv Detail & Related papers (2023-05-15T21:00:09Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Agreeing to Disagree: Annotating Offensive Language Datasets with
Annotators' Disagreement [7.288480094345606]
We focus on the level of agreement among annotators while selecting data to create offensive language datasets.
Our study comprises the creation of three novel datasets of English tweets covering different topics.
We show that such hard cases, where low agreement is present, are not necessarily due to poor-quality annotation.
arXiv Detail & Related papers (2021-09-28T08:55:04Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Towards Understanding Sample Variance in Visually Grounded Language
Generation: Evaluations and Observations [67.4375210552593]
We design experiments to understand an important but often ignored problem in visually grounded language generation.
Given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance?
We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task.
arXiv Detail & Related papers (2020-10-07T20:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.