Related papers: Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play

Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play

URL: http://arxiv.org/abs/2411.08884v1
Date: Sat, 26 Oct 2024 15:55:21 GMT
Title: Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play
Authors: Yifan Zeng,
Abstract summary: As Large Language Models (LLMs) become more prevalent, concerns about their safety, ethics, and potential biases have risen. This study innovatively applies the Domain-Specific Risk-Taking (DOSPERT) scale from cognitive science to LLMs. We propose a novel Ethical Decision-Making Risk Attitude Scale (EDRAS) to assess LLMs' ethical risk attitudes in depth.
Score: 0.43512163406552007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) become more prevalent, concerns about their safety, ethics, and potential biases have risen. Systematically evaluating LLMs' risk decision-making tendencies and attitudes, particularly in the ethical domain, has become crucial. This study innovatively applies the Domain-Specific Risk-Taking (DOSPERT) scale from cognitive science to LLMs and proposes a novel Ethical Decision-Making Risk Attitude Scale (EDRAS) to assess LLMs' ethical risk attitudes in depth. We further propose a novel approach integrating risk scales and role-playing to quantitatively evaluate systematic biases in LLMs. Through systematic evaluation and analysis of multiple mainstream LLMs, we assessed the "risk personalities" of LLMs across multiple domains, with a particular focus on the ethical domain, and revealed and quantified LLMs' systematic biases towards different groups. This research helps understand LLMs' risk decision-making and ensure their safe and reliable application. Our approach provides a tool for identifying and mitigating biases, contributing to fairer and more trustworthy AI systems. The code and data are available.

Related papers

Evaluating and Aligning Human Economic Risk Preferences in LLMs [19.574432889355627]
We investigate whether Large Language Models (LLMs) exhibit risk preferences consistent with human expectations across different personas. Our results reveal that while LLMs make reasonable decisions in simplified, personalized risk contexts, their performance declines in more complex economic decision-making tasks. Our approach improves the economic rationality of LLMs in risk-related applications, offering a step toward more human-aligned AI decision-making.
arXiv Detail & Related papers (2025-03-09T14:47:31Z)
Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents [67.07177243654485]
This survey collects and analyzes the different threats faced by large language models-based agents. We identify six key features of LLM-based agents, based on which we summarize the current research progress. We select four representative agents as case studies to analyze the risks they may face in practical use.
arXiv Detail & Related papers (2024-11-14T15:40:04Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z)
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge [84.34545223897578]
Despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. We identify 12 key potential biases and propose a new automated bias quantification framework-CALM- which quantifies and analyzes each type of bias in LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
arXiv Detail & Related papers (2024-10-03T17:53:30Z)
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context [5.361970694197912]
This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of large language models (LLMs) We estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro. Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities.
arXiv Detail & Related papers (2024-06-10T02:14:19Z)
How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs [0.0]
This study examines the risk preferences of Large Language Models (LLMs) and how aligning them with human ethical standards affects their economic decision-making. We find that aligning LLMs with human values, focusing on harmlessness, helpfulness, and honesty, shifts them towards risk aversion.
arXiv Detail & Related papers (2024-06-03T10:05:25Z)
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch. Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs. We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
Risk and Response in Large Language Models: Evaluating Key Threat Categories [6.436286493151731]
This paper explores the pressing issue of risk assessment in Large Language Models (LLMs) By utilizing the Anthropic Red-team dataset, we analyze major risk categories, including Information Hazards, Malicious Uses, and Discrimination/Hateful content. Our findings indicate that LLMs tend to consider Information Hazards less harmful, a finding confirmed by a specially developed regression model.
arXiv Detail & Related papers (2024-03-22T06:46:40Z)
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains [15.320563604087246]
High-risk domains pose unique challenges that require language models to provide accurate and safe responses. Despite the great success of large language models (LLMs), their performance in high-risk domains remains unclear.
arXiv Detail & Related papers (2023-11-25T08:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.