Language Generation Models Can Cause Harm: So What Can We Do About It?
An Actionable Survey
- URL: http://arxiv.org/abs/2210.07700v1
- Date: Fri, 14 Oct 2022 10:43:39 GMT
- Title: Language Generation Models Can Cause Harm: So What Can We Do About It?
An Actionable Survey
- Authors: Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios
Anastasopoulos, Yulia Tsvetkov
- Abstract summary: This work provides a survey of practical methods for addressing potential threats and societal harms from language generation models.
We draw on several prior works' of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators.
- Score: 50.58063811745676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in the capacity of large language models to generate
human-like text have resulted in their increased adoption in user-facing
settings. In parallel, these improvements have prompted a heated discourse
around the risks of societal harms they introduce, whether inadvertent or
malicious. Several studies have identified potential causes of these harms and
called for their mitigation via development of safer and fairer models. Going
beyond enumerating the risks of harms, this work provides a survey of practical
methods for addressing potential threats and societal harms from language
generation models. We draw on several prior works' taxonomies of language model
risks to present a structured overview of strategies for detecting and
ameliorating different kinds of risks/harms of language generators. Bridging
diverse strands of research, this survey aims to serve as a practical guide for
both LM researchers and practitioners with explanations of motivations behind
different mitigation strategies, their limitations, and open problems for
future research.
Related papers
- Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge.
A common strategy to address this limitation is to infuse the language models with retrieval mechanisms.
We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z) - Risks and NLP Design: A Case Study on Procedural Document QA [52.557503571760215]
We argue that clearer assessments of risks and harms to users will be possible when we specialize the analysis to more concrete applications and their plausible users.
We conduct a risk-oriented error analysis that could then inform the design of a future system to be deployed with lower risk of harm and better performance.
arXiv Detail & Related papers (2024-08-16T17:23:43Z) - A Survey on Natural Language Counterfactual Generation [7.022371235308068]
Natural language counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class.
We propose a new taxonomy that systematically categorizes the generation methods into four groups and summarizes the metrics for evaluating the generation quality.
arXiv Detail & Related papers (2024-07-04T15:13:59Z) - Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey [46.19229410404056]
Large language models (LLMs) have made remarkable advancements in natural language processing.
These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities.
Privacy and security issues have been revealed throughout their life cycle.
arXiv Detail & Related papers (2024-06-12T07:55:32Z) - BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models [19.446333438385153]
We propose a new methodology for attacking language models with knowledge graph augmented generation.
We induce natural language stereotypes into a knowledge graph, and use adversarial attacking strategies.
We find our method increases bias in all models, even those trained with safety guardrails.
arXiv Detail & Related papers (2024-05-08T01:51:29Z) - Against The Achilles' Heel: A Survey on Red Teaming for Generative Models [60.21722603260243]
The field of red teaming is experiencing fast-paced growth, which highlights the need for a comprehensive organization covering the entire pipeline.
Our extensive survey, which examines over 120 papers, introduces a taxonomy of fine-grained attack strategies grounded in the inherent capabilities of language models.
We have developed the searcher framework that unifies various automatic red teaming approaches.
arXiv Detail & Related papers (2024-03-31T09:50:39Z) - Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual
Predatory Chats and Abusive Texts [2.406214748890827]
This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B- parameter model.
We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu)
Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets.
arXiv Detail & Related papers (2023-08-28T16:18:50Z) - Typology of Risks of Generative Text-to-Image Models [1.933681537640272]
This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney.
Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed.
We identify 22 distinct risk types, spanning issues from data bias to malicious use.
arXiv Detail & Related papers (2023-07-08T20:33:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.