Mitigating Covertly Unsafe Text within Natural Language Systems
- URL: http://arxiv.org/abs/2210.09306v2
- Date: Mon, 20 Mar 2023 21:33:41 GMT
- Title: Mitigating Covertly Unsafe Text within Natural Language Systems
- Authors: Alex Mei, Anisha Kabir, Sharon Levy, Melanie Subbiah, Emily Allaway,
John Judge, Desmond Patton, Bruce Bimber, Kathleen McKeown, William Yang Wang
- Abstract summary: Uncontrolled systems may generate recommendations that lead to injury or life-threatening consequences.
In this paper, we distinguish types of text that can lead to physical harm and establish one particularly underexplored category: covertly unsafe text.
- Score: 55.26364166702625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An increasingly prevalent problem for intelligent technologies is text
safety, as uncontrolled systems may generate recommendations to their users
that lead to injury or life-threatening consequences. However, the degree of
explicitness of a generated statement that can cause physical harm varies. In
this paper, we distinguish types of text that can lead to physical harm and
establish one particularly underexplored category: covertly unsafe text. Then,
we further break down this category with respect to the system's information
and discuss solutions to mitigate the generation of text in each of these
subcategories. Ultimately, our work defines the problem of covertly unsafe
language that causes physical harm and argues that this subtle yet dangerous
issue needs to be prioritized by stakeholders and regulators. We highlight
mitigation strategies to inspire future researchers to tackle this challenging
problem and help improve safety within smart systems.
Related papers
- Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders [5.070104802923903]
Unsafe prompts pose a significant threat to Large Language Models (LLMs)
This paper investigates the potential of sentence encoders to distinguish safe from unsafe prompts.
We introduce new pairwise datasets and the Categorical Purity metric to measure this capability.
arXiv Detail & Related papers (2024-07-09T13:35:54Z) - The Alignment Problem in Context [0.05657375260432172]
I assess whether we are on track to solve the alignment problem for large language models.
I argue that existing strategies for alignment are insufficient, because large language models remain vulnerable to adversarial attacks.
It follows that the alignment problem is not only unsolved for current AI systems, but may be intrinsically difficult to solve without severely undermining their capabilities.
arXiv Detail & Related papers (2023-11-03T17:57:55Z) - Foveate, Attribute, and Rationalize: Towards Physically Safe and
Trustworthy AI [76.28956947107372]
Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful.
We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety.
Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
arXiv Detail & Related papers (2022-12-19T17:51:47Z) - SafeText: A Benchmark for Exploring Physical Safety in Language Models [62.810902375154136]
We study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks.
We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice.
arXiv Detail & Related papers (2022-10-18T17:59:31Z) - Language Generation Models Can Cause Harm: So What Can We Do About It?
An Actionable Survey [50.58063811745676]
This work provides a survey of practical methods for addressing potential threats and societal harms from language generation models.
We draw on several prior works' of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators.
arXiv Detail & Related papers (2022-10-14T10:43:39Z) - Inspect, Understand, Overcome: A Survey of Practical Methods for AI
Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings.
In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged.
Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z) - Overcoming Failures of Imagination in AI Infused System Development and
Deployment [71.9309995623067]
NeurIPS 2020 requested that research paper submissions include impact statements on "potential nefarious uses and the consequences of failure"
We argue that frameworks of harms must be context-aware and consider a wider range of potential stakeholders, system affordances, as well as viable proxies for assessing harms in the widest sense.
arXiv Detail & Related papers (2020-11-26T18:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.