How We Define Harm Impacts Data Annotations: Explaining How Annotators
Distinguish Hateful, Offensive, and Toxic Comments
- URL: http://arxiv.org/abs/2309.15827v1
- Date: Tue, 12 Sep 2023 19:23:40 GMT
- Title: How We Define Harm Impacts Data Annotations: Explaining How Annotators
Distinguish Hateful, Offensive, and Toxic Comments
- Authors: Angela Sch\"opke-Gonzalez, Siqi Wu, Sagar Kumar, Paul J. Resnick,
Libby Hemphill
- Abstract summary: We study whether the way that researchers define 'harm' affects annotation outcomes.
We identify that features of harm definitions and annotators' individual characteristics explain much of how annotators use these terms differently.
- Score: 3.8021618306213094
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Computational social science research has made advances in machine learning
and natural language processing that support content moderators in detecting
harmful content. These advances often rely on training datasets annotated by
crowdworkers for harmful content. In designing instructions for annotation
tasks to generate training data for these algorithms, researchers often treat
the harm concepts that we train algorithms to detect - 'hateful', 'offensive',
'toxic', 'racist', 'sexist', etc. - as interchangeable. In this work, we
studied whether the way that researchers define 'harm' affects annotation
outcomes. Using Venn diagrams, information gain comparisons, and content
analyses, we reveal that annotators do not use the concepts 'hateful',
'offensive', and 'toxic' interchangeably. We identify that features of harm
definitions and annotators' individual characteristics explain much of how
annotators use these terms differently. Our results offer empirical evidence
discouraging the common practice of using harm concepts interchangeably in
content moderation research. Instead, researchers should make specific choices
about which harm concepts to analyze based on their research goals. Recognizing
that researchers are often resource constrained, we also encourage researchers
to provide information to bound their findings when their concepts of interest
differ from concepts that off-the-shelf harmful content detection algorithms
identify. Finally, we encourage algorithm providers to ensure their instruments
can adapt to contextually-specific content detection goals (e.g., soliciting
instrument users' feedback).
Related papers
- Decoding the Narratives: Analyzing Personal Drug Experiences Shared on Reddit [1.080878521069079]
This study aims to develop a multi-level, multi-label classification model to analyze online user-generated texts about substance use experiences.
Using various multi-label classification algorithms on a set of annotated data, we show that GPT-4, when prompted with instructions, definitions, and examples, outperformed all other models.
arXiv Detail & Related papers (2024-06-17T21:56:57Z) - Beyond Behaviorist Representational Harms: A Plan for Measurement and Mitigation [1.7355698649527407]
This study focuses on an examination of current definitions of representational harms to discern what is included and what is not.
Our work highlights the unique vulnerabilities of large language models to perpetrating representational harms.
The overarching aim of this research is to establish a framework for broadening the definition of representational harms.
arXiv Detail & Related papers (2024-01-25T00:54:10Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Towards Procedural Fairness: Uncovering Biases in How a Toxic Language
Classifier Uses Sentiment Information [7.022948483613112]
This work is a step towards evaluating procedural fairness, where unfair processes lead to unfair outcomes.
The produced knowledge can guide debiasing techniques to ensure that important concepts besides identity terms are well-represented in training datasets.
arXiv Detail & Related papers (2022-10-19T16:03:25Z) - Rumor Detection with Self-supervised Learning on Texts and Social Graph [101.94546286960642]
We propose contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better.
We term this framework as Self-supervised Rumor Detection (SRD)
Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.
arXiv Detail & Related papers (2022-04-19T12:10:03Z) - Interpretable Deep Learning: Interpretations, Interpretability,
Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused.
We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy.
We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z) - Quantifying Learnability and Describability of Visual Concepts Emerging
in Representation Learning [91.58529629419135]
We consider how to characterise visual groupings discovered automatically by deep neural networks.
We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings.
arXiv Detail & Related papers (2020-10-27T18:41:49Z) - Machine Learning Explanations to Prevent Overtrust in Fake News
Detection [64.46876057393703]
This research investigates the effects of an Explainable AI assistant embedded in news review platforms for combating the propagation of fake news.
We design a news reviewing and sharing interface, create a dataset of news stories, and train four interpretable fake news detection algorithms.
For a deeper understanding of Explainable AI systems, we discuss interactions between user engagement, mental model, trust, and performance measures in the process of explaining.
arXiv Detail & Related papers (2020-07-24T05:42:29Z) - Natural language technology and query expansion: issues,
state-of-the-art and perspectives [0.0]
Linguistic characteristics that cause ambiguity and misinterpretation of queries as well as additional factors affect the users ability to accurately represent their information needs.
We lay down the anatomy of a generic linguistic based query expansion framework and propose its module-based decomposition.
For each of the modules we review the state-of-the-art solutions in the literature and categorized under the light of the techniques used.
arXiv Detail & Related papers (2020-04-23T11:39:07Z) - A Survey of Adversarial Learning on Graphs [59.21341359399431]
We investigate and summarize the existing works on graph adversarial learning tasks.
Specifically, we survey and unify the existing works w.r.t. attack and defense in graph analysis tasks.
We emphasize the importance of related evaluation metrics, investigate and summarize them comprehensively.
arXiv Detail & Related papers (2020-03-10T12:48:00Z) - Stereotypical Bias Removal for Hate Speech Detection Task using
Knowledge-based Generalizations [16.304516254043865]
We study bias mitigation from unstructured text data for hate speech detection.
We propose novel methods leveraging knowledge-based generalizations for bias-free learning.
Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset and a Twitter dataset, show that the use of knowledge-based generalizations results in better performance.
arXiv Detail & Related papers (2020-01-15T18:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.