Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization
- URL: http://arxiv.org/abs/2510.07517v2
- Date: Thu, 16 Oct 2025 00:43:28 GMT
- Title: Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization
- Authors: Hyeong Kyu Choi, Xiaojin Zhu, Sharon Li,
- Abstract summary: Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions.<n>Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias.<n>We present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD.
- Score: 13.569822165805851
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias. Third, we define the Identity Bias Coefficient (IBC), a principled metric that measures how often an agent follows a peer versus itself. Empirical studies across multiple models, datasets and debate rounds confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to "mask" identity to ensure that MAD systems reason based on content rather than source identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.
Related papers
- Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations [16.626899117362875]
Suicide remains a pressing global public health concern.<n>Social media platforms offer opportunities for early risk detection through online conversation trees.<n>Existing approaches face two major limitations.
arXiv Detail & Related papers (2026-02-27T01:06:18Z) - Multi-agent Undercover Gaming: Hallucination Removal via Counterfactual Test for Multimodal Reasoning [12.06050648342985]
Hallucination poses a major obstacle in the reasoning capabilities of large language models.<n>We introduce the Multi-agent Undercover Gaming (MUG) protocol, inspired by social deduction games like "Who is Undercover?"<n>MUG reframes MAD as a process of detecting "undercover" agents (those suffering from hallucinations) by employing multimodal counterfactual tests.
arXiv Detail & Related papers (2025-11-14T11:27:55Z) - Beyond Inference Intervention: Identity-Decoupled Diffusion for Face Anonymization [55.29071072675132]
Face anonymization aims to conceal identity information while preserving non-identity attributes.<n>We propose textbfIDsuperscript2Face, a training-centric anonymization framework.<n>We show that IDtextsuperscript2Face outperforms existing methods in visual quality, identity suppression, and utility preservation.
arXiv Detail & Related papers (2025-10-28T09:28:12Z) - Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection [81.52796950244705]
Self-diagnosis is unreliable on complex tasks unless aided by reliable external feedback.<n>We introduce a new collaborative MAD protocol, termed ColMAD, that reframes MAD as a non-zero sum game.<n>We show that ColMAD significantly outperforms previous competitive MAD by 19%.
arXiv Detail & Related papers (2025-10-23T19:46:00Z) - Can an Individual Manipulate the Collective Decisions of Multi-Agents? [53.01767232004823]
M-Spoiler is a framework that simulates agent interactions within a multi-agent system to generate adversarial samples.<n>M-Spoiler introduces a stubborn agent that actively aids in optimizing adversarial samples.<n>Our findings confirm the risks posed by the knowledge of an individual agent in multi-agent systems.
arXiv Detail & Related papers (2025-09-20T01:54:20Z) - Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment [22.305033366660187]
Language Models (LMs) are inconsistent reasoners, often generating contradictory responses to identical prompts.<n>We formalize self-consistency as an intrinsic property of well-aligned reasoning models and introduce Multi-Agent Consensus Alignment (MACA)<n>MACA enables agents to teach themselves to be more decisive and concise, and better leverage peer insights in multi-agent settings without external supervision.
arXiv Detail & Related papers (2025-09-18T17:27:28Z) - SELFI: Selective Fusion of Identity for Generalizable Deepfake Detection [16.500269508552844]
Face identity provides a powerful signal for deepfake detection.<n>Some suppress identity cues to reduce bias, while others rely on them as forensic evidence.<n>We propose textbfSELFI (textbfSELective textbfIdentity), a generalizable detection framework that dynamically modulates identity usage.
arXiv Detail & Related papers (2025-06-21T05:11:35Z) - Judging with Many Minds: Do More Perspectives Mean Less Prejudice? On Bias Amplifications and Resistance in Multi-Agent Based LLM-as-Judge [70.89799989428367]
We conduct a systematic analysis of four diverse bias types: position bias, verbosity bias, chain-of-thought bias, and bandwagon bias.<n>We evaluate these biases across two widely adopted multi-agent LLM-as-Judge frameworks: Multi-Agent-Debate and LLM-as-Meta-Judge.
arXiv Detail & Related papers (2025-05-26T03:56:41Z) - Unmasking Conversational Bias in AI Multiagent Systems [1.0705399532413618]
biases that may arise in multi-agent systems involving generative models remain under-researched.<n>We present a framework designed to quantify biases within multi-agent systems of conversational Large Language Models.<n>The bias observed in the echo-chamber experiment remains undetected by current state-of-the-art bias detection methods.
arXiv Detail & Related papers (2025-01-24T09:10:02Z) - MAD-Sherlock: Multi-Agent Debate for Visual Misinformation Detection [36.12673167913763]
We introduce MAD-Sherlock, a multi-agent debate system for out-of-context misinformation detection.<n> MAD-Sherlock frames detection as a multi-agent debate, reflecting the diverse and conflicting discourse found online.<n>Our framework is domain- and time-agnostic, requiring no finetuning, yet achieves state-of-the-art accuracy with in-depth explanations.
arXiv Detail & Related papers (2024-10-26T10:34:22Z) - DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics [52.242449026151846]
Multi-agent debates have been introduced to improve the accuracy of Large Language Models (LLMs)<n>We propose DebUnc, a debate framework that uses uncertainty metrics to assess agent confidence.
arXiv Detail & Related papers (2024-07-08T22:15:01Z) - Unsupervised Domain Adaptation on Person Re-Identification via
Dual-level Asymmetric Mutual Learning [108.86940401125649]
This paper proposes a Dual-level Asymmetric Mutual Learning method (DAML) to learn discriminative representations from a broader knowledge scope with diverse embedding spaces.
The knowledge transfer between two networks is based on an asymmetric mutual learning manner.
Experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts.
arXiv Detail & Related papers (2023-01-29T12:36:17Z) - Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation.
This eliminates the most time-consuming and tedious inter-camera identity labelling process.
We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.