Handling and Presenting Harmful Text
- URL: http://arxiv.org/abs/2204.14256v1
- Date: Fri, 29 Apr 2022 17:34:12 GMT
- Title: Handling and Presenting Harmful Text
- Authors: Leon Derczynski, Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen
- Abstract summary: Textual data can pose a risk of serious harm.
These harms can be categorised along three axes: misinformation, hate speech or racial stereotypes.
It is an unsolved problem in NLP as to how textual harms should be handled, presented, and discussed.
We provide practical advice and introduce textscHarmCheck, a resource for reflecting on research into textual harms.
- Score: 10.359716317114815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Textual data can pose a risk of serious harm. These harms can be categorised
along three axes: (1) the harm type (e.g. misinformation, hate speech or racial
stereotypes) (2) whether it is \textit{elicited} as a feature of the research
design from directly studying harmful content (e.g. training a hate speech
classifier or auditing unfiltered large-scale datasets) versus
\textit{spuriously} invoked from working on unrelated problems (e.g. language
generation or part of speech tagging) but with datasets that nonetheless
contain harmful content, and (3) who it affects, from the humans
(mis)represented in the data to those handling or labelling the data to readers
and reviewers of publications produced from the data. It is an unsolved problem
in NLP as to how textual harms should be handled, presented, and discussed;
but, stopping work on content which poses a risk of harm is untenable.
Accordingly, we provide practical advice and introduce \textsc{HarmCheck}, a
resource for reflecting on research into textual harms. We hope our work
encourages ethical, responsible, and respectful research in the NLP community.
Related papers
- Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey [7.945893812374361]
This paper aims to clear some common concerns for the attack setting, and formally establish the research problem.
Specifically, we first present the threat model of the problem, and introduce the harmful fine-tuning attack and its variants.
Finally, we outline future research directions that might contribute to the development of the field.
arXiv Detail & Related papers (2024-09-26T17:55:22Z) - What Evidence Do Language Models Find Convincing? [94.90663008214918]
We build a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts.
We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions.
Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important.
arXiv Detail & Related papers (2024-02-19T02:15:34Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - How We Define Harm Impacts Data Annotations: Explaining How Annotators
Distinguish Hateful, Offensive, and Toxic Comments [3.8021618306213094]
We study whether the way that researchers define 'harm' affects annotation outcomes.
We identify that features of harm definitions and annotators' individual characteristics explain much of how annotators use these terms differently.
arXiv Detail & Related papers (2023-09-12T19:23:40Z) - Synthetically generated text for supervised text analysis [5.71097144710995]
I provide a conceptual overview of text generation, guidance on when researchers should prefer different techniques for generating synthetic text, a discussion of ethics, and a simple technique for improving the quality of synthetic text.
I demonstrate the usefulness of synthetic text with three applications: generating synthetic tweets describing the fighting in Ukraine, synthetic news articles describing specified political events for training an event detection system, and a multilingual corpus of populist manifesto statements for training a sentence-level populism classifier.
arXiv Detail & Related papers (2023-03-28T14:55:13Z) - Constructing Highly Inductive Contexts for Dialogue Safety through
Controllable Reverse Generation [65.48908724440047]
We propose a method called emphreverse generation to construct adversarial contexts conditioned on a given response.
We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems.
arXiv Detail & Related papers (2022-12-04T12:23:41Z) - Hope Speech Detection on Social Media Platforms [1.2561455657923906]
This paper discusses various machine learning approaches to identify a sentence as Hope Speech, Non-Hope Speech, or a Neutral sentence.
The dataset used in the study contains English YouTube comments.
arXiv Detail & Related papers (2022-11-14T10:58:22Z) - Mitigating Covertly Unsafe Text within Natural Language Systems [55.26364166702625]
Uncontrolled systems may generate recommendations that lead to injury or life-threatening consequences.
In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text.
arXiv Detail & Related papers (2022-10-17T17:59:49Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Detecting Inappropriate Messages on Sensitive Topics that Could Harm a
Company's Reputation [64.22895450493729]
A calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities.
We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness.
arXiv Detail & Related papers (2021-03-09T10:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.