Fugu-MT 論文翻訳(概要): LLM Explainability with Counterfactual Chains and Causal Graphs

論文の概要: LLM Explainability with Counterfactual Chains and Causal Graphs

arxiv url: http://arxiv.org/abs/2606.05972v1
Date: Thu, 04 Jun 2026 10:15:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.718085
Title: LLM Explainability with Counterfactual Chains and Causal Graphs
Title（参考訳）: LLM Explainability with Counterfactal Chains and Causal Graphs
Authors: Nirit Nussbaum-Hoffer, Nitay Calderon, Liat Ein-Dor, Roi Reichart,
Abstract要約: 因果グラフはメカニズムを透過的にするための高レベル言語を提供する。最近の研究では、Large Language Models (LLMs) を用いて、外世界のプロセスの因果グラフを復元している。 LLM推論自体をモデル化するために因果グラフを使用し、モデルがどのように認識し、予測を生成するために高レベルの概念を整理するかを、ステークホルダーに透過的なビューを提供する。
参考スコア（独自算出の注目度）: 19.857433252352482
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language Models (LLMs) to recover causal graphs of external-world processes. Instead, in this paper, we use causal graphs to model LLM inference itself, providing stakeholders with a transparent view of how the model perceives and organizes high-level concepts to produce a prediction. We propose a four-phase method for constructing such graphs. Given a target LLM and a set of textual examples, our method discovers class-discriminative, human-interpretable concepts and maps each input to LLM-perceived concept states. We then introduce an MCMC-inspired counterfactual augmentation procedure that expands the sparse observational data through chains of counterfactuals. This enables stable causal discovery with $σ$-CG, yielding informative, interpretable graphs. We apply our method to three LLMs across disease diagnosis, sentiment analysis, and LLM-as-a-judge classification tasks. We evaluate the learned graphs for predictive fidelity and structural stability, and the MCMC-inspired augmentation for convergence and downstream utility. Our results show that the discovered causal graphs capture meaningful dependencies consistent with LLMs' reasoning. Together, this paper provides a foundation for concept-level explainability of LLMs.
Abstract（参考訳）: 因果グラフはメカニズムを透過的にするための高レベル言語を提供する。最近の研究では、Large Language Models (LLMs) を用いて、外世界のプロセスの因果グラフを復元している。そこで本論文では,LLM推論自体をモデル化するために因果グラフを用い,モデルがどのように認識し,高レベルの概念を整理し,予測を生成するのかを,ステークホルダーに透過的な視点で示す。このようなグラフを構築するための4相法を提案する。対象のLLMとテキスト例の集合から,分類的,人間的解釈可能な概念を発見し,各入力をLLM認識された概念状態にマッピングする。次に,MCMCにインスパイアされた反ファクト改善手法を導入し,反ファクトの連鎖を通してスパース観測データを拡大する。これにより$σ$-CGで安定した因果発見が可能となり、情報的で解釈可能なグラフが得られる。本手法は, 疾患診断, 感情分析, LLM-as-a-judge 分類タスクの3つの LLM に適用した。予測的忠実度と構造安定性の学習グラフと,MCMCにインスパイアされた収束・下流ユーティリティの強化について検討した。その結果, 因果グラフは LLM の推論と一致した有意な依存関係を捉えていることがわかった。本稿では,LLMの概念レベルの説明可能性の基盤を提供する。

論文の概要: LLM Explainability with Counterfactual Chains and Causal Graphs

関連論文リスト