Related papers: Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias

Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias

URL: http://arxiv.org/abs/2512.23518v1
Date: Mon, 29 Dec 2025 14:52:34 GMT
Title: Single LLM Debate, MoLaCE: Mixture of Latent Concept Experts Against Confirmation Bias
Authors: Hazel Kim, Philip Torr,
Abstract summary: Large language models (LLMs) are highly vulnerable to input confirmation bias.<n>MoLaCE is a lightweight inference-time framework that addresses confirmation bias by mixing experts instantiated as different activation strengths.<n>We empirically show that it consistently reduces confirmation bias, improves robustness, and surpasses multi-agent debate.
Score: 24.182306712604966
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are highly vulnerable to input confirmation bias. When a prompt implies a preferred answer, models often reinforce that bias rather than explore alternatives. This phenomenon remains underexplored, yet it is already harmful in base models and poses an even greater risk in multi-agent debate, where echo chambers reinforce bias instead of correction. We introduce Mixture of Latent Concept Experts (MoLaCE), a lightweight inference-time framework that addresses confirmation bias by mixing experts instantiated as different activation strengths over latent concepts that shape model responses. Our key insight is that, due to the compositional nature of language, differently phrased prompts reweight latent concepts in prompt-specific ways that affect factual correctness, so no single fixed intervention can be applied universally across inputs. This design enables a single LLM to emulate the benefits of debate internally while remaining computationally efficient and scalable. It can also be integrated into multi-agent debate frameworks to diversify perspectives and reduce correlated errors. We empirically show that it consistently reduces confirmation bias, improves robustness, and matches or surpasses multi-agent debate while requiring only a fraction of the computation.

Related papers

Knowledge Divergence and the Value of Debate for Scalable Oversight [0.0]
Debate and reinforcement learning from AI feedback are proposed methods for scalable oversight of advanced AI systems.<n>We analyze this by parameterizing debate's value through the geometry of knowledge divergence between debating models.<n>We offer the first formal connection between debate and RLAIF, a geometric foundation for understanding when adversarial oversight protocols are justified.
arXiv Detail & Related papers (2026-03-05T15:36:08Z)
Multimodal Fact-Level Attribution for Verifiable Reasoning [80.60864342985748]
Multimodal large language models (MLLMs) are increasingly used for real-world tasks involving multi-step reasoning and long-form generation.<n>Existing multimodal grounding benchmarks and evaluation methods fail to assess attribution in complex multimodal reasoning.<n>We introduce MuRGAt, a benchmark for evaluating fact-level multimodal attribution in settings that require reasoning beyond direct observation.
arXiv Detail & Related papers (2026-02-12T03:10:02Z)
Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning [49.99694105650486]
Self-Debate Reinforcement Learning (SDRL) is a training framework that equips a single large language model with strong problem-solving ability.<n>We show that SDRL improves overall Multi-Agent Debate (MAD) performance while simultaneously strengthening single model reasoning.
arXiv Detail & Related papers (2026-01-29T20:21:44Z)
Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts [74.47786985522762]
We identify a critical failure mode termed textual inertia, where models tend to blindly adhere to the erroneous text while neglecting conflicting visual evidence.<n>We propose the LogicGraph Perturbation Protocol that structurally injects perturbations into the reasoning chains of diverse LMMs.<n>Results reveal that models successfully self-correct in less than 10% of cases and predominantly succumb to blind textual error propagation.
arXiv Detail & Related papers (2026-01-07T16:39:34Z)
Directional Reasoning Injection for Fine-Tuning MLLMs [51.53222423215055]
Multimodal large language models (MLLMs) are rapidly advancing, yet their reasoning ability often lags behind that of strong text-only counterparts.<n>Existing methods to bridge this gap rely on supervised fine-tuning over large-scale multimodal reasoning data or reinforcement learning.<n>We propose Directional Reasoning Injection for Fine-Tuning (DRIFT) to solve this problem.
arXiv Detail & Related papers (2025-10-16T18:06:46Z)
Multi-Agent Debate for LLM Judges with Adaptive Stability Detection [46.67172123607961]
We propose a multi-agent debate judge framework where agents collaboratively reason and iteratively refine their responses.<n>We formalize the debate process mathematically, analyzing agent interactions and proving that debate amplifies correctness compared to static ensembles.<n> Experiments across multiple benchmarks and models demonstrate that our framework improves judgment accuracy over majority voting while maintaining computational efficiency.
arXiv Detail & Related papers (2025-10-14T16:30:30Z)
MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction [52.89860691282002]
Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce.<n>Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data.<n>We introduce textscmodelname, a multi-agent debate framework that employs multiple MLLM agents to iteratively refine inferences.
arXiv Detail & Related papers (2025-10-07T06:27:42Z)
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates [10.243214692251412]
We introduce RedDebate, a novel multi-agent debate framework to identify and mitigate their unsafe behaviours.<n>RedDebate employs collaborative argumentation among multiple Large Language Models (LLMs) across diverse debate scenarios.<n> Empirical evaluation on safety benchmarks across a diverse set of models demonstrates that RedDebate substantially reduces unsafe outputs.
arXiv Detail & Related papers (2025-06-04T09:09:54Z)
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [58.32070787537946]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z)
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models [26.17300490736624]
Multimodal Large Language Models (MLLMs) are predominantly trained and tested on consistent visual-textual inputs.<n>We propose the Multimodal Inconsistency Reasoning benchmark to assess MLLMs' ability to detect and reason about semantic mismatches.<n>We evaluate six state-of-the-art MLLMs, showing that models with dedicated multimodal reasoning capabilities, such as o1, substantially outperform their counterparts.
arXiv Detail & Related papers (2025-02-22T01:52:37Z)
Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate [21.342632695285364]
Leveraging large language models (LLMs) for rumor detection holds significant promise.<n>We propose the Stance Separated Multi-Agent Debate (S2MAD) to address this issue.<n>Our proposed model outperforms state-of-the-art methods in terms of performance.
arXiv Detail & Related papers (2024-12-06T08:52:30Z)
Chain of Thought Still Thinks Fast: APriCoT Helps with Thinking Slow [0.0]
We introduce Counterfactual Prompting with Agnostically Primed CoT (APriCoT)<n>APriCoT effectively reduces the influence of base-rate probabilities while improving overall accuracy.<n>Our results suggest that mitigating bias requires a slow thinking process which CoT alone may not provide.
arXiv Detail & Related papers (2024-08-16T10:34:50Z)
DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics [52.242449026151846]
Multi-agent debates have been introduced to improve the accuracy of Large Language Models (LLMs)<n>We propose DebUnc, a debate framework that uses uncertainty metrics to assess agent confidence.
arXiv Detail & Related papers (2024-07-08T22:15:01Z)
Multimodal Chain-of-Thought Reasoning in Language Models [94.70184390935661]
We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-02-02T07:51:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.