Ethical and social risks of harm from Language Models
- URL: http://arxiv.org/abs/2112.04359v1
- Date: Wed, 8 Dec 2021 16:09:48 GMT
- Title: Ethical and social risks of harm from Language Models
- Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan
Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh,
Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba
Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean
Legassick, Geoffrey Irving, Iason Gabriel
- Abstract summary: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs)
A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences.
- Score: 22.964941107198023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to help structure the risk landscape associated with
large-scale Language Models (LMs). In order to foster advances in responsible
innovation, an in-depth understanding of the potential risks posed by these
models is needed. A wide range of established and anticipated risks are
analysed in detail, drawing on multidisciplinary expertise and literature from
computer science, linguistics, and social sciences.
We outline six specific risk areas: I. Discrimination, Exclusion and
Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious
Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and
Environmental Harms. The first area concerns the perpetuation of stereotypes,
unfair discrimination, exclusionary norms, toxic language, and lower
performance by social group for LMs. The second focuses on risks from private
data leaks or LMs correctly inferring sensitive information. The third
addresses risks arising from poor, false or misleading information including in
sensitive domains, and knock-on risks such as the erosion of trust in shared
information. The fourth considers risks from actors who try to use LMs to cause
harm. The fifth focuses on risks specific to LLMs used to underpin
conversational agents that interact with human users, including unsafe use,
manipulation or deception. The sixth discusses the risk of environmental harm,
job automation, and other challenges that may have a disparate effect on
different social groups or communities.
In total, we review 21 risks in-depth. We discuss the points of origin of
different risks and point to potential mitigation approaches. Lastly, we
discuss organisational responsibilities in implementing mitigations, and the
role of collaboration and participation. We highlight directions for further
research, particularly on expanding the toolkit for assessing and evaluating
the outlined risks in LMs.
Related papers
- Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - "It's a conversation, not a quiz": A Risk Taxonomy and Reflection Tool for LLM Adoption in Public Health [16.418366314356184]
We conduct focus groups with health professionals and health issue experiencers to unpack their concerns.
We synthesize participants' perspectives into a risk taxonomy.
This taxonomy highlights four dimensions of risk in individual behaviors, human-centered care, information ecosystem, and technology accountability.
arXiv Detail & Related papers (2024-11-04T20:35:10Z) - Risks and NLP Design: A Case Study on Procedural Document QA [52.557503571760215]
We argue that clearer assessments of risks and harms to users will be possible when we specialize the analysis to more concrete applications and their plausible users.
We conduct a risk-oriented error analysis that could then inform the design of a future system to be deployed with lower risk of harm and better performance.
arXiv Detail & Related papers (2024-08-16T17:23:43Z) - Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models [46.93425758722059]
CRiskEval is a Chinese dataset meticulously designed for gauging the risk proclivities inherent in large language models (LLMs)
We define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe.
The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks.
arXiv Detail & Related papers (2024-06-07T08:52:24Z) - Risk and Response in Large Language Models: Evaluating Key Threat Categories [6.436286493151731]
This paper explores the pressing issue of risk assessment in Large Language Models (LLMs)
By utilizing the Anthropic Red-team dataset, we analyze major risk categories, including Information Hazards, Malicious Uses, and Discrimination/Hateful content.
Our findings indicate that LLMs tend to consider Information Hazards less harmful, a finding confirmed by a specially developed regression model.
arXiv Detail & Related papers (2024-03-22T06:46:40Z) - The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning [87.1610740406279]
White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons.
Current evaluations are private, preventing further research into mitigating risk.
We publicly release the Weapons of Mass Destruction Proxy benchmark, a dataset of 3,668 multiple-choice questions.
arXiv Detail & Related papers (2024-03-05T18:59:35Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - The Reasoning Under Uncertainty Trap: A Structural AI Risk [0.0]
Report provides an exposition of what makes RUU so challenging for both humans and machines.
We detail how this misuse risk connects to a wider network of underlying structural risks.
arXiv Detail & Related papers (2024-01-29T17:16:57Z) - A Framework for Institutional Risk Identification using Knowledge Graphs
and Automated News Profiling [5.631924211771643]
Organizations around the world face an array of risks impacting their operations globally.
It is imperative to have a robust risk identification process to detect and evaluate the impact of potential risks before they materialize.
arXiv Detail & Related papers (2021-09-19T11:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.