Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents
- URL: http://arxiv.org/abs/2602.22413v1
- Date: Wed, 25 Feb 2026 21:09:14 GMT
- Title: Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents
- Authors: Jonas Karge,
- Abstract summary: We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting.<n>While classical voting results assume fixed participation, real-world aggregation often benefits from allowing agents to say I don't know'
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume fixed participation, real-world aggregation often benefits from allowing agents to say ``I don't know.'' We propose a probabilistic framework where agents engage in a \textit{calibration} phase, updating beliefs about their own fixed competence, before facing a final confidence gate that determines whether to vote or abstain. We derive a non-asymptotic lower bound on the group's success probability and prove that this \textit{selective participation} generalizes the asymptotic guarantees of the CJT to a sequential, confidence-gated setting. Empirically, we validate these bounds via Monte Carlo simulations. While our results are general, we discuss their potential application to AI safety, outlining how this framework can mitigate \textit{hallucinations} in collective LLM decision-making.
Related papers
- Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration [48.19579266939883]
Diffusion large language models (dLLMs) have attracted significant attention for their ability to enhance diversity, controllability, and parallelism.<n>We propose DiSE, a simple yet effective self-evaluation confidence quantification method for dLLMs.
arXiv Detail & Related papers (2026-03-03T08:58:20Z) - Active Epistemic Control for Query-Efficient Verified Planning [1.8055130471307603]
We present textbfActive Epistemic Control (AEC), a planning layer that integrates model-based belief management with categorical feasibility checks.<n>AEC maintains a strict separation between a emphgrounded fact store used for commitment and a emphbelief store used only for pruning candidate plans.
arXiv Detail & Related papers (2026-02-03T19:51:10Z) - Chance-Constrained Inference for Hallucination Risk Control in Large Language Models [0.0]
Large language models generate fluent but invalid responses, including factual hallucinations.<n>We formulate inference as a deployment-time risk control problem.<n>We show that confidence-based selective prediction does not, in general, imply probabilistic risk guarantees.
arXiv Detail & Related papers (2026-02-02T04:51:47Z) - The Silent Scholar Problem: A Probabilistic Framework for Breaking Epistemic Asymmetry in LLM Agents [0.6117371161379209]
We propose a formal probabilistic framework that provides agents with a non-altruistic motive for bidirectional knowledge exchange.<n>We show how these accumulated belief states serve as verifiable reward signals for Reinforcement Learning from Human Feedback (RLHF) and high-quality data filters for Supervised Fine-Tuning (SFT)<n> Simulation results validate that this uncertainty-driven strategy significantly outperforms random baselines in heterogeneous environments.
arXiv Detail & Related papers (2025-12-24T02:02:25Z) - A Framework for Bounding Deterministic Risk with PAC-Bayes: Applications to Majority Votes [4.664367264604233]
PAC-Bayes is a popular framework for obtaining generalization guarantees in uncountable hypothesis spaces.<n>We propose a unified framework to extract guarantees holding for a single hypothesis from PAC-Bayesian guarantees.
arXiv Detail & Related papers (2025-10-29T14:38:35Z) - Core Safety Values for Provably Corrigible Agents [2.6451153531057985]
We introduce the first implementable framework for corrigibility, with provable guarantees in multi-step, partially observed environments.<n>Our framework replaces a single reward with five *structurally separate* utility heads.<n>For open-ended settings where adversaries can modify the agent, we prove that deciding whether an arbitrary post-hack agent will ever violate corrigibility is undecidable.
arXiv Detail & Related papers (2025-07-28T16:19:25Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks.<n>They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct.<n>We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z) - Mitigating LLM Hallucinations via Conformal Abstention [70.83870602967625]
We develop a principled procedure for determining when a large language model should abstain from responding in a general domain.
We leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate)
Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets.
arXiv Detail & Related papers (2024-04-04T11:32:03Z) - When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples.
A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction.
Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z) - Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences
with Possibly Dependent Observations [44.71254888821376]
We provide the first type-I-error and expected-rejection-time guarantees under general non-data generating processes.
We show how to apply our results to inference on parameters defined by estimating equations, such as average treatment effects.
arXiv Detail & Related papers (2022-12-29T18:37:08Z) - Generalised Likelihood Ratio Testing Adversaries through the
Differential Privacy Lens [69.10072367807095]
Differential Privacy (DP) provides tight upper bounds on the capabilities of optimal adversaries.
We relax the assumption of a Neyman--Pearson (NPO) adversary to a Generalized Likelihood Test (GLRT) adversary.
This mild relaxation leads to improved privacy guarantees.
arXiv Detail & Related papers (2022-10-24T08:24:10Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.