Language Generation Models Can Cause Harm: So What Can We Do About It?
An Actionable Survey
- URL: http://arxiv.org/abs/2210.07700v1
- Date: Fri, 14 Oct 2022 10:43:39 GMT
- Title: Language Generation Models Can Cause Harm: So What Can We Do About It?
An Actionable Survey
- Authors: Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios
Anastasopoulos, Yulia Tsvetkov
- Abstract summary: This work provides a survey of practical methods for addressing potential threats and societal harms from language generation models.
We draw on several prior works' of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators.
- Score: 50.58063811745676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in the capacity of large language models to generate
human-like text have resulted in their increased adoption in user-facing
settings. In parallel, these improvements have prompted a heated discourse
around the risks of societal harms they introduce, whether inadvertent or
malicious. Several studies have identified potential causes of these harms and
called for their mitigation via development of safer and fairer models. Going
beyond enumerating the risks of harms, this work provides a survey of practical
methods for addressing potential threats and societal harms from language
generation models. We draw on several prior works' taxonomies of language model
risks to present a structured overview of strategies for detecting and
ameliorating different kinds of risks/harms of language generators. Bridging
diverse strands of research, this survey aims to serve as a practical guide for
both LM researchers and practitioners with explanations of motivations behind
different mitigation strategies, their limitations, and open problems for
future research.
Related papers
- A Survey on Natural Language Counterfactual Generation [7.022371235308068]
Natural Language Counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class.
We propose a new taxonomy that categorizes the generation methods into four groups and systematically summarize the metrics for evaluating the generation quality.
arXiv Detail & Related papers (2024-07-04T15:13:59Z) - Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey [46.19229410404056]
Large language models (LLMs) have made remarkable advancements in natural language processing.
These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities.
Privacy and security issues have been revealed throughout their life cycle.
arXiv Detail & Related papers (2024-06-12T07:55:32Z) - BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models [19.446333438385153]
We propose a new methodology for attacking language models with knowledge graph augmented generation.
We induce natural language stereotypes into a knowledge graph, and use adversarial attacking strategies.
We find our method increases bias in all models, even those trained with safety guardrails.
arXiv Detail & Related papers (2024-05-08T01:51:29Z) - Against The Achilles' Heel: A Survey on Red Teaming for Generative Models [60.21722603260243]
The field of red teaming is experiencing fast-paced growth, which highlights the need for a comprehensive organization covering the entire pipeline.
Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models.
We have developed the searcher framework that unifies various automatic red teaming approaches.
arXiv Detail & Related papers (2024-03-31T09:50:39Z) - Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual
Predatory Chats and Abusive Texts [2.406214748890827]
This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B- parameter model.
We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu)
Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets.
arXiv Detail & Related papers (2023-08-28T16:18:50Z) - Typology of Risks of Generative Text-to-Image Models [1.933681537640272]
This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney.
Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed.
We identify 22 distinct risk types, spanning issues from data bias to malicious use.
arXiv Detail & Related papers (2023-07-08T20:33:30Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - Inspect, Understand, Overcome: A Survey of Practical Methods for AI
Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings.
In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged.
Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.