Related papers: Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning

Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning

URL: http://arxiv.org/abs/2502.15361v2
Date: Sat, 24 May 2025 19:46:14 GMT
Title: Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning
Authors: Xuyang Wu, Jinming Nian, Ting-Ruen Wei, Zhiqiang Tao, Hsin-Tai Wu, Yi Fang,
Abstract summary: We present the first systematic evaluation of social bias within large language models (LLMs)<n>We quantify how biased reasoning steps correlate with incorrect predictions and often lead to stereotype expression.<n>We propose Answer Distribution as Bias Proxy (ADBP), a lightweight mitigation method that detects bias by tracking how model predictions change.
Score: 17.86131226260848
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have enabled automatic generation of chain-of-thought (CoT) reasoning, leading to strong performance on tasks such as math and code. However, when reasoning steps reflect social stereotypes (e.g., those related to gender, race or age), they can reinforce harmful associations and lead to misleading conclusions. We present the first systematic evaluation of social bias within LLM-generated reasoning, using the BBQ dataset to analyze both prediction accuracy and bias. Our study spans a wide range of mainstream reasoning models, including instruction-tuned and CoT-augmented variants of DeepSeek-R1 (8B/32B), ChatGPT, and other open-source LLMs. We quantify how biased reasoning steps correlate with incorrect predictions and often lead to stereotype expression. To mitigate reasoning-induced bias, we propose Answer Distribution as Bias Proxy (ADBP), a lightweight mitigation method that detects bias by tracking how model predictions change across incremental reasoning steps. ADBP outperforms a stereotype-free baseline in most cases, mitigating bias and improving the accuracy of LLM outputs. Code will be released upon paper acceptance.

Related papers

FairReason: Balancing Reasoning and Social Bias in MLLMs [50.618158642714505]
Multimodal Large Language Models (MLLMs) already achieve state-of-the-art results across a wide range of tasks and modalities.<n>Recent studies explore advanced prompting schemes and post-training fine-tuning to push their reasoning ability further.
arXiv Detail & Related papers (2025-07-30T19:57:22Z)
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training [57.03005244917803]
Large language models (LLMs) often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training.<n>Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT)<n> Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks.
arXiv Detail & Related papers (2025-06-11T06:30:28Z)
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition [11.422434149376478]
Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks. Recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features.
arXiv Detail & Related papers (2025-04-04T20:57:36Z)
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment [54.62926010621013]
We introduce a novel task, code reasoning, to provide a new perspective for the reasoning abilities of large language models.<n>We summarize three meta-benchmarks based on established forms of logical reasoning, and instantiate these into eight specific benchmark tasks.<n>We present a new pathway exploration pipeline inspired by human intricate problem-solving methods.
arXiv Detail & Related papers (2025-02-17T10:39:58Z)
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection [18.625071242029936]
Large Language Models (LLMs) have been shown to exhibit various biases and stereotypes in their generated content.<n>This paper presents a systematic framework to investigate and compare explicit and implicit biases in LLMs.
arXiv Detail & Related papers (2025-01-04T14:08:52Z)
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying [0.3659498819753633]
State-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning.<n>This paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation.<n>We show that employing these critical questions can improve the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2024-12-19T18:51:30Z)
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings [13.686732204665738]
We extend an existing BBQ dataset by incorporating fill-in-the-blank and short-answer question types.<n>Our finding reveals that LLMs produce responses that are more biased against certain protected attributes, like age and socio-economic status.<n>Our debiasing approach combined zero-shot, few-shot, and chain-of-thought could significantly reduce the level of bias to almost 0.
arXiv Detail & Related papers (2024-12-09T01:29:47Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs.<n>Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Assessing Bias in Metric Models for LLM Open-Ended Generation Bias Benchmarks [3.973239756262797]
This study examines such biases in open-generation benchmarks like BOLD and SAGED. Results reveal unequal treatment of demographic descriptors, calling for more robust bias metric models.
arXiv Detail & Related papers (2024-10-14T20:08:40Z)
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs [0.0]
Large Language Models (LLMs) are being adopted across a wide range of tasks. Recent research indicates that LLMs can harbor implicit biases even when they pass explicit bias evaluations. This study highlights that newer or larger language models do not automatically exhibit reduced bias.
arXiv Detail & Related papers (2024-10-13T03:43:18Z)
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning [41.9992614617405]
We introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks.
arXiv Detail & Related papers (2024-10-01T20:05:51Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences [5.141416267381492]
We consider the case of syllogistic reasoning, an area of deductive reasoning studied extensively in logic and cognitive psychology. We investigate the effects of chain-of-thought reasoning, in-context learning, and supervised fine-tuning on syllogistic reasoning. Our results suggest that the behavior of pre-trained LLMs can be explained by cognitive science.
arXiv Detail & Related papers (2024-06-17T08:59:04Z)
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners [58.15511660018742]
This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities. We develop carefully controlled synthetic datasets, featuring conjunction fallacy and syllogistic problems.
arXiv Detail & Related papers (2024-06-16T19:22:53Z)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z)
Concise and Organized Perception Facilitates Reasoning in Large Language Models [32.71672086718057]
We show that large language models (LLMs) exhibit failure patterns akin to human-like cognitive biases when dealing with disordered and irrelevant content in reasoning tasks. We propose a novel reasoning approach named Concise and Organized Perception (COP) COP carefully analyzes the given statements to identify the most pertinent information while eliminating redundancy efficiently.
arXiv Detail & Related papers (2023-10-05T04:47:49Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.