Related papers: FaithLM: Towards Faithful Explanations for Large Language Models

FaithLM: Towards Faithful Explanations for Large Language Models

URL: http://arxiv.org/abs/2402.04678v4
Date: Mon, 27 Oct 2025 06:19:56 GMT
Title: FaithLM: Towards Faithful Explanations for Large Language Models
Authors: Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Ruixiang Tang, Shaochen Zhong, Fan Yang, Mengnan Du, Xuanting Cai, Vladimir Braverman, Xia Hu,
Abstract summary: We introduce FaithLM, a model-agnostic framework that evaluates and improves the faithfulness of large language models.<n>We show that FaithLM consistently increases faithfulness and produces explanations more aligned with human rationales than strong self-explanation baselines.
Score: 60.45183469474916
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) increasingly produce natural language explanations, yet these explanations often lack faithfulness, and they do not reliably reflect the evidence the model uses to decide. We introduce FaithLM, a model-agnostic framework that evaluates and improves the faithfulness of LLM explanations without token masking or task-specific heuristics. FaithLM formalizes explanation faithfulness as an intervention property: a faithful explanation should yield a prediction shift when its content is contradicted. Theoretical analysis shows that the resulting contrary-hint score is a sound and discriminative estimator of faithfulness. Building on this principle, FaithLM iteratively refines both the elicitation prompt and the explanation to maximize the measured score. Experiments on three multi-domain datasets and multiple LLM backbones demonstrate that FaithLM consistently increases faithfulness and produces explanations more aligned with human rationales than strong self-explanation baselines. These findings highlight that intervention-based evaluation, coupled with iterative optimization, provides a principled route toward faithful and reliable LLM explanations.

Related papers

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian [0.0]
Large language models (LLMs) are increasingly used to generate self-explanations alongside their predictions.<n>This study evaluates the faithfulness of LLM-generated explanations in the context of emotion classification in Persian.
arXiv Detail & Related papers (2025-11-24T21:29:15Z)
FaithAct: Faithfulness Planning and Acting in MLLMs [12.08093899815684]
Unfaithfulness remains a persistent challenge for large language models.<n>We introduce FaithEval for quantifying step-level and chain-level faithfulness by evaluating whether each claimed object is visually supported by the image.<n>We propose FaithAct, a faithfulness-first planning and acting framework that enforces evidential grounding at every reasoning step.
arXiv Detail & Related papers (2025-11-11T16:22:49Z)
Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology [17.119158367942088]
Uncertainty in large language model (LLM) explanations is important for evaluating their faithfulness and reasoning consistency. We propose a novel framework that quantifies uncertainty in LLM explanations through a reasoning topology perspective.
arXiv Detail & Related papers (2025-02-24T10:28:21Z)
SEER: Self-Explainability Enhancement of Large Language Models' Representations [18.840860385644316]
We propose a self-explaining method SEER to explain Large Language Models (LLMs) In this paper, we propose a self-explaining method SEER, enhancing LLMs' explainability by aggregating the same concept and disentangling the different concepts in the representation space. We showcase the applications of SEER on trustworthiness-related tasks, where self-explained LLMs achieve consistent improvement in explainability and performance.
arXiv Detail & Related papers (2025-02-07T13:25:33Z)
Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks.<n>They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct.<n>We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z)
Understanding the Relationship between Prompts and Response Uncertainty in Large Language Models [55.332004960574004]
Large language models (LLMs) are widely used in decision-making, but their reliability, especially in critical tasks like healthcare, is not well-established. This paper investigates how the uncertainty of responses generated by LLMs relates to the information provided in the input prompt. We propose a prompt-response concept model that explains how LLMs generate responses and helps understand the relationship between prompts and response uncertainty.
arXiv Detail & Related papers (2024-07-20T11:19:58Z)
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving [13.485604499678262]
This paper investigates the verification and refinement of natural language explanations through the integration of Large Language Models (LLMs) and Theorem Provers (TPs) We present a neuro-symbolic framework, named Explanation-Refiner, that integrates TPs with LLMs to generate and formalise explanatory sentences. In turn, the TP is employed to provide formal guarantees on the logical validity of the explanations and to generate feedback for subsequent improvements.
arXiv Detail & Related papers (2024-05-02T15:20:01Z)
Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate [75.10515686215177]
Large Language Models (LLMs) excel in text generation, but their capability for producing faithful explanations in fact-checking remains underexamined. We propose the Multi-Agent Debate Refinement (MADR) framework, leveraging multiple LLMs as agents with diverse roles. MADR ensures that the final explanation undergoes rigorous validation, significantly reducing the likelihood of unfaithful elements and aligning closely with the provided evidence.
arXiv Detail & Related papers (2024-02-12T04:32:33Z)
Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models [26.11408084129897]
Large Language Models (LLMs) are deployed as powerful tools for several natural language processing (NLP) applications. Recent works show that modern LLMs can generate self-explanations (SEs), which elicit their intermediate reasoning steps for explaining their behavior. We discuss the dichotomy between faithfulness and plausibility in SEs generated by LLMs.
arXiv Detail & Related papers (2024-02-07T06:32:50Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
Are self-explanations from Large Language Models faithful? [35.40666730867487]
Large Language Models (LLMs) excel at many tasks and will even explain their reasoning, so-called self-explanations. It's important to measure if self-explanations truly reflect the model's behavior. We propose employing self-consistency checks to measure faithfulness.
arXiv Detail & Related papers (2024-01-15T19:39:15Z)
XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making [13.928951741632815]
Large Language Models (LLMs) have recently made impressive strides in natural language understanding tasks. In this paper, we look into bringing some transparency to this process by introducing a new explanation dataset. Our dataset includes 12,102 question-answer-explanation (QAE) triples.
arXiv Detail & Related papers (2023-11-15T00:34:28Z)
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning [50.00090601424348]
Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks. We propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
arXiv Detail & Related papers (2023-11-13T06:13:38Z)
Language Models with Rationality [57.37201135072838]
Large language models (LLMs) are proficient at question-answering (QA) It is not always clear how (or even if) an answer follows from their latent "beliefs"
arXiv Detail & Related papers (2023-05-23T17:04:25Z)
LMExplainer: Grounding Knowledge and Explaining Language Models [37.578973458651944]
Language models (LMs) like GPT-4 are important in AI applications, but their opaque decision-making process reduces user trust, especially in safety-critical areas. We introduce LMExplainer, a novel knowledge-grounded explainer that clarifies the reasoning process of LMs through intuitive, human-understandable explanations.
arXiv Detail & Related papers (2023-03-29T08:59:44Z)
Explanations from Large Language Models Make Small Reasoners Better [61.991772773700006]
We show that our method can consistently and significantly outperform finetuning baselines across different settings. As a side benefit, human evaluation shows that our method can generate high-quality explanations to justify its predictions.
arXiv Detail & Related papers (2022-10-13T04:50:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.