Related papers: FaithAct: Faithfulness Planning and Acting in MLLMs

FaithAct: Faithfulness Planning and Acting in MLLMs

URL: http://arxiv.org/abs/2511.08409v1
Date: Wed, 12 Nov 2025 01:57:45 GMT
Title: FaithAct: Faithfulness Planning and Acting in MLLMs
Authors: Junxian Li, Xinyue Xu, Sai Ma, Sichao Li,
Abstract summary: Unfaithfulness remains a persistent challenge for large language models.<n>We introduce FaithEval for quantifying step-level and chain-level faithfulness by evaluating whether each claimed object is visually supported by the image.<n>We propose FaithAct, a faithfulness-first planning and acting framework that enforces evidential grounding at every reasoning step.
Score: 12.08093899815684
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unfaithfulness remains a persistent challenge for large language models (LLMs), which often produce plausible yet ungrounded reasoning chains that diverge from perceptual evidence or final conclusions. We distinguish between behavioral faithfulness (alignment between reasoning and output) and perceptual faithfulness (alignment between reasoning and input), and introduce FaithEval for quantifying step-level and chain-level faithfulness by evaluating whether each claimed object is visually supported by the image. Building on these insights, we propose FaithAct, a faithfulness-first planning and acting framework that enforces evidential grounding at every reasoning step. Experiments across multiple reasoning benchmarks demonstrate that FaithAct improves perceptual faithfulness by up to 26% without degrading task accuracy compared to prompt-based and tool-augmented baselines. Our analysis shows that treating faithfulness as a guiding principle not only mitigates hallucination but also leads to more stable reasoning trajectories. This work thereby establishes a unified framework for both evaluating and enforcing faithfulness in multimodal reasoning.

Related papers

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency [78.91846841708586]
We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference.<n>We propose Neighbor-Consistency Belief (NCB), a structural measure of belief that evaluates response coherence across a conceptual neighborhood.<n>We also present Structure-Aware Training (SAT), which optimize context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%.
arXiv Detail & Related papers (2026-01-09T16:23:21Z)
Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking [11.763473690046721]
Reasoning-augmented vision language models generate explicit chains of thought that promise greater capability and transparency.<n>Models may reach correct answers via visually unfaithful intermediate steps, or reason faithfully yet fail on the final prediction.<n>We introduce the visual faithfulness of reasoning chains as a distinct evaluation dimension, focusing on whether the perception steps of a reasoning chain are grounded in the image.
arXiv Detail & Related papers (2025-12-13T07:04:42Z)
Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks [54.31998314008198]
Large Language Models (LLMs) excel in reasoning tasks requiring a single correct answer, but they perform poorly in multi-solution tasks.<n>We attribute this limitation to textbfreasoning overconfidence: a tendency to express undue certainty in an incomplete solution set.<n>We propose the textbfcognitive-rigidity hypothesis, which posits that overconfidence arises when the reasoning process prematurely converges on a narrow set of thought paths.
arXiv Detail & Related papers (2025-12-01T14:35:06Z)
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning [62.452350134196934]
FaithCoT-Bench is a unified benchmark for instance-level CoT unfaithfulness detection.<n>Our framework formulates unfaithfulness detection as a discriminative decision problem.<n>FaithCoT-Bench sets a solid basis for future research toward more interpretable and trustworthy reasoning in LLMs.
arXiv Detail & Related papers (2025-10-05T05:16:54Z)
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning [49.17801010041155]
Multimodal large language models (MLLMs) promise enhanced reasoning by integrating diverse inputs such as text, vision, and audio.<n>Yet cross-modal reasoning remains underexplored, with conflicting reports on whether added modalities help or harm performance.<n>We categorize multimodal reasoning into six interaction patterns, varying how facts are distributed across modalities and logically combined.
arXiv Detail & Related papers (2025-09-28T08:46:11Z)
Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation [29.44609627447293]
We propose an approach to summary faithfulness evaluation in which multiple agents are assigned initial stances.<n>We introduce a new dimension, ambiguity, and a detailed taxonomy to identify such special cases.<n>Experiments demonstrate our approach can help identify ambiguities, and have even a stronger performance on non-ambiguous summaries.
arXiv Detail & Related papers (2025-02-12T15:46:50Z)
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models [29.67884478799914]
Large Language Models (LLMs) are capable of generating persuasive Natural Language Explanations (NLEs) to justify their answers. Recent studies have proposed various methods to measure the faithfulness of NLEs, typically by inserting perturbations at the explanation or feature level. We argue that these approaches are neither comprehensive nor correctly designed according to the established definition of faithfulness.
arXiv Detail & Related papers (2024-10-18T03:45:42Z)
FaithLM: Towards Faithful Explanations for Large Language Models [60.45183469474916]
We introduce FaithLM, a model-agnostic framework that evaluates and improves the faithfulness of large language models.<n>We show that FaithLM consistently increases faithfulness and produces explanations more aligned with human rationales than strong self-explanation baselines.
arXiv Detail & Related papers (2024-02-07T09:09:14Z)
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs [55.66353783572259]
Causal-Consistency Chain-of-Thought harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models.<n>Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations.
arXiv Detail & Related papers (2023-08-23T04:59:21Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? [58.13152510843004]
With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems. What is interpretability, and what constitutes a high-quality interpretation? We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria.
arXiv Detail & Related papers (2020-04-07T20:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.