Ethical and social risks of harm from Language Models
- URL: http://arxiv.org/abs/2112.04359v1
- Date: Wed, 8 Dec 2021 16:09:48 GMT
- Title: Ethical and social risks of harm from Language Models
- Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan
Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh,
Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba
Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean
Legassick, Geoffrey Irving, Iason Gabriel
- Abstract summary: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs)
A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences.
- Score: 22.964941107198023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims to help structure the risk landscape associated with
large-scale Language Models (LMs). In order to foster advances in responsible
innovation, an in-depth understanding of the potential risks posed by these
models is needed. A wide range of established and anticipated risks are
analysed in detail, drawing on multidisciplinary expertise and literature from
computer science, linguistics, and social sciences.
We outline six specific risk areas: I. Discrimination, Exclusion and
Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious
Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and
Environmental Harms. The first area concerns the perpetuation of stereotypes,
unfair discrimination, exclusionary norms, toxic language, and lower
performance by social group for LMs. The second focuses on risks from private
data leaks or LMs correctly inferring sensitive information. The third
addresses risks arising from poor, false or misleading information including in
sensitive domains, and knock-on risks such as the erosion of trust in shared
information. The fourth considers risks from actors who try to use LMs to cause
harm. The fifth focuses on risks specific to LLMs used to underpin
conversational agents that interact with human users, including unsafe use,
manipulation or deception. The sixth discusses the risk of environmental harm,
job automation, and other challenges that may have a disparate effect on
different social groups or communities.
In total, we review 21 risks in-depth. We discuss the points of origin of
different risks and point to potential mitigation approaches. Lastly, we
discuss organisational responsibilities in implementing mitigations, and the
role of collaboration and participation. We highlight directions for further
research, particularly on expanding the toolkit for assessing and evaluating
the outlined risks in LMs.
Related papers
- Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models [46.93425758722059]
CRiskEval is a Chinese dataset meticulously designed for gauging the risk proclivities inherent in large language models (LLMs)
We define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe.
The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks.
arXiv Detail & Related papers (2024-06-07T08:52:24Z) - Risk and Response in Large Language Models: Evaluating Key Threat Categories [6.436286493151731]
This paper explores the pressing issue of risk assessment in Large Language Models (LLMs)
By utilizing the Anthropic Red-team dataset, we analyze major risk categories, including Information Hazards, Malicious Uses, and Discrimination/Hateful content.
Our findings indicate that LLMs tend to consider Information Hazards less harmful, a finding confirmed by a specially developed regression model.
arXiv Detail & Related papers (2024-03-22T06:46:40Z) - The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning [87.1610740406279]
White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons.
Current evaluations are private, preventing further research into mitigating risk.
We publicly release the Weapons of Mass Destruction Proxy benchmark, a dataset of 3,668 multiple-choice questions.
arXiv Detail & Related papers (2024-03-05T18:59:35Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - The Reasoning Under Uncertainty Trap: A Structural AI Risk [0.0]
Report provides an exposition of what makes RUU so challenging for both humans and machines.
We detail how this misuse risk connects to a wider network of underlying structural risks.
arXiv Detail & Related papers (2024-01-29T17:16:57Z) - On the Risk of Misinformation Pollution with Large Language Models [127.1107824751703]
We investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation.
Our study reveals that LLMs can act as effective misinformation generators, leading to a significant degradation in the performance of Open-Domain Question Answering (ODQA) systems.
arXiv Detail & Related papers (2023-05-23T04:10:26Z) - Foveate, Attribute, and Rationalize: Towards Physically Safe and
Trustworthy AI [76.28956947107372]
Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful.
We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety.
Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
arXiv Detail & Related papers (2022-12-19T17:51:47Z) - The Risks of Machine Learning Systems [11.105884571838818]
A system's overall risk is influenced by its direct and indirect effects.
Existing frameworks for ML risk/impact assessment often address an abstract notion of risk or do not concretize this dependence.
First-order risks stem from aspects of the ML system, while second-order risks stem from the consequences of first-order risks.
arXiv Detail & Related papers (2022-04-21T02:42:10Z) - A Framework for Institutional Risk Identification using Knowledge Graphs
and Automated News Profiling [5.631924211771643]
Organizations around the world face an array of risks impacting their operations globally.
It is imperative to have a robust risk identification process to detect and evaluate the impact of potential risks before they materialize.
arXiv Detail & Related papers (2021-09-19T11:06:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.