Related papers: Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas

Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas

URL: http://arxiv.org/abs/2508.11278v1
Date: Fri, 15 Aug 2025 07:29:46 GMT
Title: Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas
Authors: Francesco Sovrano, Gabriele Dominici, Rita Sevastjanova, Alessandra Stramiglio, Alberto Bacchelli,
Abstract summary: General-purpose AI (GPAI) systems may help mitigate human cognitive biases due to their non-human nature.<n>But their training on human-generated data raises a critical question: Do GPAI systems themselves exhibit cognitive biases?<n>We present the first benchmarking framework to evaluate data-induced cognitive biases in GPAI within software engineering.
Score: 47.582118202259394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human cognitive biases in software engineering can lead to costly errors. While general-purpose AI (GPAI) systems may help mitigate these biases due to their non-human nature, their training on human-generated data raises a critical question: Do GPAI systems themselves exhibit cognitive biases? To investigate this, we present the first dynamic benchmarking framework to evaluate data-induced cognitive biases in GPAI within software engineering workflows. Starting with a seed set of 16 hand-crafted realistic tasks, each featuring one of 8 cognitive biases (e.g., anchoring, framing) and corresponding unbiased variants, we test whether bias-inducing linguistic cues unrelated to task logic can lead GPAI systems from correct to incorrect conclusions. To scale the benchmark and ensure realism, we develop an on-demand augmentation pipeline relying on GPAI systems to generate task variants that preserve bias-inducing cues while varying surface details. This pipeline ensures correctness (88--99% on average, according to human evaluation), promotes diversity, and controls reasoning complexity by leveraging Prolog-based reasoning and LLM-as-a-judge validation. It also verifies that the embedded biases are both harmful and undetectable by logic-based, unbiased reasoners. We evaluate leading GPAI systems (GPT, LLaMA, DeepSeek) and find a consistent tendency to rely on shallow linguistic heuristics over deep reasoning. All systems exhibit cognitive biases (ranging from 5.9% to 35% across types), with bias sensitivity increasing sharply with task complexity (up to 49%), highlighting critical risks in real-world software engineering deployments.

Related papers

Making Bias Non-Predictive: Training Robust LLM Judges via Reinforcement Learning [91.8584139564909]
Large language models (LLMs) increasingly serve as automated judges, yet they remain susceptible to cognitive biases.<n>We propose Epistemic Independence Training (EIT), a reinforcement learning framework grounded in a key principle.<n>EIT operationalizes this through a balanced conflict strategy where bias signals are equally likely to support correct and incorrect answers.
arXiv Detail & Related papers (2026-02-02T01:43:48Z)
Cognitive Biases in LLM-Assisted Software Development [4.020789871097349]
This paper presents the first comprehensive study of cognitive biases in Large Language Models (LLMs)-assisted development.<n>Through a systematic analysis of 90 cognitive biases specific to developer-LLM interactions, we develop a taxonomy of 15 bias categories validated by cognitive psychologists.
arXiv Detail & Related papers (2026-01-12T22:32:21Z)
Large Language Newsvendor: Decision Biases and Cognitive Mechanisms [2.7070404673380817]
Large language models (LLMs) are increasingly integrated into business decision making.<n>LLMs replicate and amplify human cognitive biases.<n>This is particularly critical in high-stakes operational contexts like supply chain management.
arXiv Detail & Related papers (2025-12-14T04:51:53Z)
Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation [1.5892420496333068]
ECHO is a novel framework for proactive AI harm anticipation.<n>It enables early-stage detection of bias-to-harm pathways.<n>We validate ECHO in two high-stakes domains (disease diagnosis and hiring)
arXiv Detail & Related papers (2025-11-27T07:25:21Z)
Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
Cognitive Debiasing Large Language Models for Decision-Making [71.2409973056137]
Large language models (LLMs) have shown potential in supporting decision-making applications.<n>We propose a cognitive debiasing approach, self-adaptive cognitive debiasing (SACD)<n>Our method follows three sequential steps -- bias determination, bias analysis, and cognitive debiasing -- to iteratively mitigate potential cognitive biases in prompts.
arXiv Detail & Related papers (2025-04-05T11:23:05Z)
Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models. Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems. To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Why we need biased AI -- How including cognitive and ethical machine biases can enhance AI systems [0.0]
We argue for the structurewise implementation of human cognitive biases in learning algorithms. In order to achieve ethical machine behavior, filter mechanisms have to be applied. This paper is the first tentative step to explicitly pursue the idea of a re-evaluation of the ethical significance of machine biases.
arXiv Detail & Related papers (2022-03-18T12:39:35Z)
Improving Fairness of AI Systems with Lossless De-biasing [15.039284892391565]
Mitigating bias in AI systems to increase overall fairness has emerged as an important challenge. We present an information-lossless de-biasing technique that targets the scarcity of data in the disadvantaged group.
arXiv Detail & Related papers (2021-05-10T17:38:38Z)
Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data. We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases. Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.