Related papers: Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey

URL: http://arxiv.org/abs/2210.07700v1
Date: Fri, 14 Oct 2022 10:43:39 GMT
Title: Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Authors: Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, Yulia Tsvetkov
Abstract summary: This work provides a survey of practical methods for addressing potential threats and societal harms from language generation models. We draw on several prior works' of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators.
Score: 50.58063811745676
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in the capacity of large language models to generate human-like text have resulted in their increased adoption in user-facing settings. In parallel, these improvements have prompted a heated discourse around the risks of societal harms they introduce, whether inadvertent or malicious. Several studies have identified potential causes of these harms and called for their mitigation via development of safer and fairer models. Going beyond enumerating the risks of harms, this work provides a survey of practical methods for addressing potential threats and societal harms from language generation models. We draw on several prior works' taxonomies of language model risks to present a structured overview of strategies for detecting and ameliorating different kinds of risks/harms of language generators. Bridging diverse strands of research, this survey aims to serve as a practical guide for both LM researchers and practitioners with explanations of motivations behind different mitigation strategies, their limitations, and open problems for future research.

Related papers

Adversarial Negotiation Dynamics in Generative Language Models [1.307537039737708]
Generative language models are increasingly used for contract drafting and enhancement. This creates a scenario where competing parties deploy different language models against each other. We evaluate the performance and vulnerabilities of major open-source language models in head-to-head competitions.
arXiv Detail & Related papers (2024-12-29T18:17:55Z)
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents. We identify six key features of LLM-based agents, based on which we summarize the current research progress. We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z)
Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge. A common strategy to address this limitation is to infuse the language models with retrieval mechanisms. We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z)
Risks and NLP Design: A Case Study on Procedural Document QA [52.557503571760215]
We argue that clearer assessments of risks and harms to users will be possible when we specialize the analysis to more concrete applications and their plausible users. We conduct a risk-oriented error analysis that could then inform the design of a future system to be deployed with lower risk of harm and better performance.
arXiv Detail & Related papers (2024-08-16T17:23:43Z)
A Survey on Natural Language Counterfactual Generation [7.022371235308068]
Natural language counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class. We propose a new taxonomy that systematically categorizes the generation methods into four groups and summarizes the metrics for evaluating the generation quality.
arXiv Detail & Related papers (2024-07-04T15:13:59Z)
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity) Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z)
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey [46.19229410404056]
Large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities. Privacy and security issues have been revealed throughout their life cycle.
arXiv Detail & Related papers (2024-06-12T07:55:32Z)
BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models [19.446333438385153]
We propose a new methodology for attacking language models with knowledge graph augmented generation. We induce natural language stereotypes into a knowledge graph, and use adversarial attacking strategies. We find our method increases bias in all models, even those trained with safety guardrails.
arXiv Detail & Related papers (2024-05-08T01:51:29Z)
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts [2.406214748890827]
This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B- parameter model. We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu) Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets.
arXiv Detail & Related papers (2023-08-28T16:18:50Z)
Typology of Risks of Generative Text-to-Image Models [1.933681537640272]
This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney. Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed. We identify 22 distinct risk types, spanning issues from data bias to malicious use.
arXiv Detail & Related papers (2023-07-08T20:33:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.