Fugu-MT 論文翻訳(概要): Actionable Interpretability Must Be Defined in Terms of Symmetries

論文の概要: Actionable Interpretability Must Be Defined in Terms of Symmetries

arxiv url: http://arxiv.org/abs/2601.12913v2
Date: Wed, 28 Jan 2026 16:57:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-29 13:43:08.971973
Title: Actionable Interpretability Must Be Defined in Terms of Symmetries
Title（参考訳）: 対称性の観点で解釈可能性を定義する必要がある
Authors: Pietro Barbiero, Mateo Espinosa Zarlenga, Francesco Giannini, Alberto Termine, Filippo Bonchi, Mateja Jamnik, Giuseppe Marra,
Abstract要約: 本稿では、人工知能(AI)における解釈可能性の研究は、既存の定義では、どのように解釈可能性が公式にテストされるか、あるいは設計されるのかを記述できないため、基本的には不適切である、と論じる。我々は、解釈可能性の実行可能な定義は、モデル設計を通知し、テスト可能な条件に導く*対称性*という用語で定式化されなければならないと仮定する。
参考スコア（独自算出の注目度）: 37.964025348175504
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions. Under a probabilistic view, we hypothesise that four symmetries (inference equivariance, information invariance, concept-closure invariance, and structural invariance) suffice to (i) formalise interpretable models as a subclass of probabilistic models, (ii) yield a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion, and (iii) provide a formal framework to verify compliance with safety standards and regulations.
Abstract（参考訳）: 本稿では、人工知能(AI)における解釈可能性の研究は、既存の解釈可能性の定義が、どのように解釈可能性をどのように公式にテストするか、あるいは設計するかを記述できないため、基本的には不備である、と論じる。我々は、解釈可能性の実行可能な定義は、モデル設計を通知し、テスト可能な条件に導く*対称性*という用語で定式化されなければならないと仮定する。確率論的見解の下では、4つの対称性(推論同値、情報不変性、概念閉包不変性、構造不変性)が十分であると仮定する。 (i)確率モデルのサブクラスとして解釈可能なモデルを定式化する。 2)解釈可能な推論(例えば、アライメント、介入、反事実)をベイズ反転の形式として統一した定式化し、三安全基準及び規程の遵守を検証するための正式な枠組みを提供する。

論文の概要: Actionable Interpretability Must Be Defined in Terms of Symmetries

関連論文リスト