Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude
- URL: http://arxiv.org/abs/2501.10484v5
- Date: Thu, 30 Oct 2025 17:48:05 GMT
- Title: Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude
- Authors: Wentao Xu, Yile Yan, Yuqi Zhu,
- Abstract summary: This study systematically evaluates how nine popular Large Language Models respond to ethical dilemmas involving protected attributes.<n>Across 50,400 trials spanning single and intersectional attribute combinations, we assess models' ethical preferences, sensitivity, stability, and clustering patterns.<n>Results reveal significant biases in protected attributes in all models, with differing preferences depending on model type and dilemma context.
- Score: 8.959468665453286
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in Large Language Models (LLMs) have enabled human-like responses across various tasks, raising questions about their ethical decision-making capabilities and potential biases. This study systematically evaluates how nine popular LLMs (both open-source and closed-source) respond to ethical dilemmas involving protected attributes. Across 50,400 trials spanning single and intersectional attribute combinations in four dilemma scenarios (protective vs. harmful), we assess models' ethical preferences, sensitivity, stability, and clustering patterns. Results reveal significant biases in protected attributes in all models, with differing preferences depending on model type and dilemma context. Notably, open-source LLMs show stronger preferences for marginalized groups and greater sensitivity in harmful scenarios, while closed-source models are more selective in protective situations and tend to favor mainstream groups. We also find that ethical behavior varies across dilemma types: LLMs maintain consistent patterns in protective scenarios but respond with more diverse and cognitively demanding decisions in harmful ones. Furthermore, models display more pronounced ethical tendencies under intersectional conditions than in single-attribute settings, suggesting that complex inputs reveal deeper biases. These findings highlight the need for multi-dimensional, context-aware evaluation of LLMs' ethical behavior and offer a systematic evaluation and approach to understanding and addressing fairness in LLM decision-making.
Related papers
- Adaptive Generation of Bias-Eliciting Questions for LLMs [18.608477560948003]
Large language models (LLMs) are now widely deployed in user-facing applications, reaching hundreds of millions worldwide.<n>We introduce a counterfactual bias evaluation framework that automatically generates realistic, open-ended questions over sensitive attributes such as sex, race, or religion.<n>We also capture distinct response dimensions that are increasingly relevant in user interactions, such as asymmetric refusals and explicit acknowledgment of bias.
arXiv Detail & Related papers (2025-10-14T13:08:10Z) - Towards Safer AI Moderation: Evaluating LLM Moderators Through a Unified Benchmark Dataset and Advocating a Human-First Approach [0.9147875523270338]
Large Language Models (LLMs) have demonstrated remarkable capabilities, surpassing earlier models in complexity and performance.<n>They struggle with detecting implicit hate, offensive language, and gender biases due to the subjective and context-dependent nature of these issues.<n>We develop an experimental framework based on state-of-the-art (SOTA) models to assess human emotions and offensive behaviors.
arXiv Detail & Related papers (2025-08-09T18:00:27Z) - Decoding the Mind of Large Language Models: A Quantitative Evaluation of Ideology and Biases [0.276240219662896]
We propose a novel framework for evaluating Large Language Models (LLMs)<n>By applying our framework to ChatGPT and Gemini, findings revealed that while LLMs generally maintain consistent opinions on many topics, their ideologies differ across models and languages.<n>Both models also exhibited problematic biases, unethical or unfair claims, which might have negative societal impacts.
arXiv Detail & Related papers (2025-05-18T00:52:06Z) - Behind the Screens: Uncovering Bias in AI-Driven Video Interview Assessments Using Counterfactuals [0.0]
We introduce a counterfactual-based framework to evaluate and quantify bias in AI-driven personality assessments.<n>Our approach employs generative adversarial networks (GANs) to generate counterfactual representations of job applicants.<n>This work provides a scalable tool for fairness auditing of commercial AI hiring platforms.
arXiv Detail & Related papers (2025-05-17T18:46:14Z) - Ethical AI in the Healthcare Sector: Investigating Key Drivers of Adoption through the Multi-Dimensional Ethical AI Adoption Model (MEAAM) [1.5458951336481048]
This study introduces the Multi-Dimensional Ethical AI Adoption Model (MEAAM)<n>It categorizes 13 critical ethical variables across four foundational dimensions of Ethical AI Fair AI, Responsible AI, Explainable AI, and Sustainable AI.<n>It investigates the influence of these ethical constructs on two outcomes Operational AI Adoption and Systemic AI Adoption.
arXiv Detail & Related papers (2025-05-04T10:40:05Z) - Analyzing Feedback Mechanisms in AI-Generated MCQs: Insights into Readability, Lexical Properties, and Levels of Challenge [0.0]
This study delves into the linguistic and structural attributes of feedback generated by Google's Gemini 1.5-flash text model for computer science multiple-choice questions (MCQs)<n>Key linguistic metrics, such as length, readability scores (Flesch-Kincaid Grade Level), vocabulary richness, and lexical density, were computed and examined.<n>The findings reveal significant interaction effects between feedback tone and question difficulty, demonstrating the dynamic adaptation of AI-generated feedback within diverse educational contexts.
arXiv Detail & Related papers (2025-04-19T09:20:52Z) - CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives [3.7731230532888036]
CLASH (Character perspective-based LLM Assessments in Situations with High-stakes) is a dataset consisting of 345 high-impact dilemmas along with 3,795 individual perspectives of diverse values.
Even the strongest models, such as GPT-4o and Claude-Sonnet, achieve less than 50% accuracy in identifying situations where the decision should be ambivalent.
arXiv Detail & Related papers (2025-04-15T02:54:16Z) - Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making [71.71796367760112]
Large language models (LLMs) have shown potential in supporting decision-making applications.<n>We propose a cognitive debiasing approach, self-adaptive cognitive debiasing (SACD)<n>We evaluate SACD on finance, healthcare, and legal decision-making tasks using both open-weight and closed-weight LLMs.
arXiv Detail & Related papers (2025-04-05T11:23:05Z) - Human Decision-making is Susceptible to AI-driven Manipulation [87.24007555151452]
AI systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes.
This study examined human susceptibility to such manipulation in financial and emotional decision-making contexts.
arXiv Detail & Related papers (2025-02-11T15:56:22Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [68.62012304574012]
multimodal generative models have sparked critical discussions on their reliability, fairness and potential for misuse.<n>We propose an evaluation framework to assess model reliability by analyzing responses to global and local perturbations in the embedding space.<n>Our method lays the groundwork for detecting unreliable, bias-injected models and tracing the provenance of embedded biases.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Large-scale moral machine experiment on large language models [0.0]
We evaluate moral judgments across 52 different Large Language Models (LLMs) in autonomous driving scenarios.<n> proprietary models and open-source models exceeding 10 billion parameters demonstrated relatively close alignment with human judgments.<n>However, model updates did not consistently improve alignment with human preferences, and many LLMs showed excessive emphasis on specific ethical principles.
arXiv Detail & Related papers (2024-11-11T08:36:49Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [91.86718720024825]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.<n>Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.<n>We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes.
We find that the majority of disagreements are in opposition with standard reward modeling approaches.
We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z) - Heuristics and Biases in AI Decision-Making: Implications for Responsible AGI [0.0]
We investigate the presence of cognitive biases in three large language models (LLMs): GPT-4o, Gemma 2, and Llama 3.1.
The study uses 1,500 experiments across nine established cognitive biases to evaluate the models' responses and consistency.
arXiv Detail & Related papers (2024-09-26T05:34:00Z) - Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context [5.361970694197912]
This paper proposes a framework, grounded in behavioral economics, to evaluate the decision-making behaviors of large language models (LLMs)
We estimate the degree of risk preference, probability weighting, and loss aversion in a context-free setting for three commercial LLMs: ChatGPT-4.0-Turbo, Claude-3-Opus, and Gemini-1.0-pro.
Our results reveal that LLMs generally exhibit patterns similar to humans, such as risk aversion and loss aversion, with a tendency to overweight small probabilities.
arXiv Detail & Related papers (2024-06-10T02:14:19Z) - Exploring and steering the moral compass of Large Language Models [55.2480439325792]
Large Language Models (LLMs) have become central to advancing automation and decision-making across various sectors.
This study proposes a comprehensive comparative analysis of the most advanced LLMs to assess their moral profiles.
arXiv Detail & Related papers (2024-05-27T16:49:22Z) - Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models.
There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z) - What Hides behind Unfairness? Exploring Dynamics Fairness in Reinforcement Learning [52.51430732904994]
In reinforcement learning problems, agents must consider long-term fairness while maximizing returns.
Recent works have proposed many different types of fairness notions, but how unfairness arises in RL problems remains unclear.
We introduce a novel notion called dynamics fairness, which explicitly captures the inequality stemming from environmental dynamics.
arXiv Detail & Related papers (2024-04-16T22:47:59Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Evaluating Psychological Safety of Large Language Models [72.88260608425949]
We designed unbiased prompts to evaluate the psychological safety of large language models (LLMs)
We tested five different LLMs by using two personality tests: Short Dark Triad (SD-3) and Big Five Inventory (BFI)
Despite being instruction fine-tuned with safety metrics to reduce toxicity, InstructGPT, GPT-3.5, and GPT-4 still showed dark personality patterns.
Fine-tuning Llama-2-chat-7B with responses from BFI using direct preference optimization could effectively reduce the psychological toxicity of the model.
arXiv Detail & Related papers (2022-12-20T18:45:07Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - AI-Ethics by Design. Evaluating Public Perception on the Importance of
Ethical Design Principles of AI [0.0]
We investigate how ethical principles are weighted in comparison to each other.
We show that different preference models for ethically designed systems exist among the German population.
arXiv Detail & Related papers (2021-06-01T09:01:14Z) - Differentially Private and Fair Deep Learning: A Lagrangian Dual
Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors.
The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.