FairReason: Balancing Reasoning and Social Bias in MLLMs
- URL: http://arxiv.org/abs/2507.23067v1
- Date: Wed, 30 Jul 2025 19:57:22 GMT
- Title: FairReason: Balancing Reasoning and Social Bias in MLLMs
- Authors: Zhenyu Pan, Yutong Zhang, Jianshu Zhang, Haoran Lu, Haozheng Luo, Yuwei Han, Philip S. Yu, Manling Li, Han Liu,
- Abstract summary: Multimodal Large Language Models (MLLMs) already achieve state-of-the-art results across a wide range of tasks and modalities.<n>Recent studies explore advanced prompting schemes and post-training fine-tuning to push their reasoning ability further.
- Score: 50.618158642714505
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLMs) already achieve state-of-the-art results across a wide range of tasks and modalities. To push their reasoning ability further, recent studies explore advanced prompting schemes and post-training fine-tuning. Although these techniques improve logical accuracy, they frequently leave the models' outputs burdened with pronounced social biases. Clarifying how reasoning gains interact with bias mitigation-and whether the two objectives inherently trade off-therefore remains an open and pressing research problem. Our study begins by benchmarking three bias-mitigation strategies-supervised fine-uning (SFT), knowledge distillation (KD), and rule-based reinforcement learning (RL)-under identical conditions, establishing their baseline strengths and weaknesses. Building on these results, we vary the proportion of debias-focused and reasoning-centric samples within each paradigm to chart the reasoning-versus-bias trade-off. Our sweeps reveal a consistent sweet spot: a roughly 1:4 mix trained with reinforcement learning cuts stereotype scores by 10% while retaining 88% of the model's original reasoning accuracy, offering concrete guidance for balancing fairness and capability in MLLMs.
Related papers
- Guiding LLM Decision-Making with Fairness Reward Models [12.32062012708603]
Large language models are increasingly used to support high-stakes decisions.<n>We propose a framework for training a generalizable Fairness Reward Model.<n>We show that our approach consistently improves fairness while matching, or even surpassing, baseline accuracy.
arXiv Detail & Related papers (2025-07-15T14:20:23Z) - Meta-Fair: AI-Assisted Fairness Testing of Large Language Models [2.9632404823837777]
Fairness is a core principle in the development of Artificial Intelligence (AI) systems.<n>Current approaches to fairness testing in large language models (LLMs) often rely on manual evaluation, fixed templates, deterministics, and curated datasets.<n>This work aims to lay the groundwork for a novel, automated method for testing fairness in LLMs.
arXiv Detail & Related papers (2025-07-03T11:20:59Z) - NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks [65.70224757972068]
We select reasoning traces from a strong teacher model based on a large pool of questions from NaturalReasoning.<n>We find that simply scaling up data size with random sampling is a strong baseline with steady performance gains.<n>We find that selecting difficult examples that require more diverse reasoning strategies is more sample-efficient to transfer the teacher model's reasoning skills.
arXiv Detail & Related papers (2025-07-02T17:30:24Z) - Lost at the Beginning of Reasoning [82.18834329384514]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.<n>We introduce a new benchmark specifically constructed with deliberately flawed first reasoning steps to systematically evaluate model self-correction capabilities.
arXiv Detail & Related papers (2025-06-27T09:53:57Z) - Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [82.43575191712726]
We introduce a fine-grained analytic framework to dissect the impact ofReinforcement learning on reasoning.<n>Our framework specifically investigates key elements that have been hypothesized to benefit from RL training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z) - Learning to Reason via Mixture-of-Thought for Logical Reasoning [56.24256916896427]
Mixture-of-Thought (MoT) is a framework that enables LLMs to reason across three complementary modalities: natural language, code, and truth-table.<n>MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions.
arXiv Detail & Related papers (2025-05-21T17:59:54Z) - Reasoning Towards Fairness: Mitigating Bias in Language Models through Reasoning-Guided Fine-Tuning [12.559028963968247]
We investigate the crucial relationship between a model's reasoning ability and fairness.<n>We find that larger models with stronger reasoning abilities exhibit substantially lower stereotypical bias.<n>We introduce ReGiFT, a novel approach that extracts structured reasoning traces from advanced reasoning models and infuses them into models that lack such capabilities.
arXiv Detail & Related papers (2025-04-08T03:21:51Z) - Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models? [14.29992535286614]
Theory of Mind (ToM) is the ability to attribute mental states to others.<n>Recent advancements in Large Language Models have shown promising performance on ToM benchmarks.<n>Do these benchmarks necessitate explicit human-like reasoning processes, or can models succeed through alternative strategies?
arXiv Detail & Related papers (2025-04-02T12:58:42Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions [0.46873264197900916]
We show that certain cognitive biases can enhance decision-making efficiency through rational deviations and shortcuts.<n>By introducing moderation and an abstention option, we reduce error rates, improve decision accuracy, and optimize decision rates.<n>This approach offers a novel way to leverage cognitive biases to improve the practical utility of large language models.
arXiv Detail & Related papers (2024-06-16T16:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.