Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis
- URL: http://arxiv.org/abs/2511.01425v1
- Date: Mon, 03 Nov 2025 10:21:35 GMT
- Title: Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis
- Authors: Yuhang Huang, Zekai Lin, Fan Zhong, Lei Liu,
- Abstract summary: Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust.<n>We propose an interactive agent that produces explanations through an auditable sequence of actions.<n>This policy is optimized using reinforcement learning, resulting in a model that is both efficient and generalizable.
- Score: 10.749786847079163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust. To address this, we propose an interactive agent that produces explanations through an auditable sequence of actions. The agent learns a policy to strategically seek external visual evidence to support its diagnostic reasoning. This policy is optimized using reinforcement learning, resulting in a model that is both efficient and generalizable. Our experiments show that this action-based reasoning process significantly improves calibrated accuracy, reducing the Brier score by 18\% compared to a non-interactive baseline. To validate the faithfulness of the agent's explanations, we introduce a causal intervention method. By masking the visual evidence the agent chooses to use, we observe a measurable degradation in its performance ($\Delta$Brier=+0.029), confirming that the evidence is integral to its decision-making process. Our work provides a practical framework for building AI systems with verifiable and faithful reasoning capabilities.
Related papers
- Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis [29.630872344186873]
Interactive medical consultation requires an agent to proactively elicit missing clinical evidence under uncertainty.<n>Existing evaluations largely remain static or outcome-centric, neglecting the evidence-gathering process.<n>We propose an interactive evaluation framework that explicitly models the consultation process using a simulated patient and a revsimulated reporter grounded in atomic evidences.
arXiv Detail & Related papers (2026-01-27T16:36:35Z) - From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z) - The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z) - Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation [76.5533899503582]
Large language models (LLMs) are increasingly used as judges to evaluate agent performance.<n>We show this paradigm implicitly assumes that the agent's chain-of-thought (CoT) reasoning faithfully reflects both its internal reasoning and the underlying environment state.<n>We demonstrate that manipulated reasoning alone can inflate false positive rates of state-of-the-art VLM judges by up to 90% across 800 trajectories spanning diverse web tasks.
arXiv Detail & Related papers (2026-01-21T06:07:43Z) - Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation [42.38513187601995]
Large Language Models (LLMs) trained with reinforcement learning and verifiable rewards have achieved strong results on complex reasoning tasks.<n>Recent work extends this paradigm to a multi-agent setting, where a meta-thinking agent proposes plans and monitors progress while a reasoning agent executes subtasks through sequential conversational turns.<n>Despite promising performance, we identify a critical limitation: lazy agent behavior, in which one agent dominates while the other contributes little, undermining collaboration and collapsing the setup to an ineffective single agent.<n>We propose a verifiable reward mechanism that encourages deliberation by allowing the reasoning agent to discard noisy outputs, consolidate instructions, and restart its reasoning process
arXiv Detail & Related papers (2025-11-04T06:37:31Z) - Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent [58.90049897180927]
We introduce an automated framework for detecting unintended reliance on visual features in vision models.<n>A self-reflective agent generates and tests hypotheses about visual attributes that a model may rely on.<n>We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies.
arXiv Detail & Related papers (2025-10-24T17:59:02Z) - Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z) - VerifiAgent: a Unified Verification Agent in Language Model Reasoning [10.227089771963943]
We propose a unified verification agent that integrates two levels of verification: meta-verification and tool-based adaptive verification.<n>VerifiAgent autonomously selects appropriate verification tools based on the reasoning type.<n>It can be effectively applied to inference scaling, achieving better results with fewer generated samples and costs.
arXiv Detail & Related papers (2025-04-01T04:05:03Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations [58.96953392466609]
We take an in-depth look at the causal awareness of modern representations of agent interactions.<n>We show that recent representations are already partially resilient to perturbations of non-causal agents.<n>We introduce a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z) - Improving Explainability of Disentangled Representations using
Multipath-Attribution Mappings [12.145748796751619]
We propose a framework that utilizes interpretable disentangled representations for downstream-task prediction.
We demonstrate the effectiveness of our approach on a synthetic benchmark suite and two medical datasets.
arXiv Detail & Related papers (2023-06-15T10:52:29Z) - Differential Assessment of Black-Box AI Agents [29.98710357871698]
We propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models.
We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy.
Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch.
arXiv Detail & Related papers (2022-03-24T17:48:58Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.