Related papers: Actionable Interpretability Must Be Defined in Terms of Symmetries

Actionable Interpretability Must Be Defined in Terms of Symmetries

URL: http://arxiv.org/abs/2601.12913v2
Date: Wed, 28 Jan 2026 16:57:03 GMT
Title: Actionable Interpretability Must Be Defined in Terms of Symmetries
Authors: Pietro Barbiero, Mateo Espinosa Zarlenga, Francesco Giannini, Alberto Termine, Filippo Bonchi, Mateja Jamnik, Giuseppe Marra,
Abstract summary: This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions fail to describe how interpretability can be formally tested or designed for.<n>We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions.
Score: 37.964025348175504
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions. Under a probabilistic view, we hypothesise that four symmetries (inference equivariance, information invariance, concept-closure invariance, and structural invariance) suffice to (i) formalise interpretable models as a subclass of probabilistic models, (ii) yield a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion, and (iii) provide a formal framework to verify compliance with safety standards and regulations.

Related papers

Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference [5.621958475334369]
We develop a principled formulation of possibilistic variational inference.<n>Applying it to a special class of exponential-family functions, we highlight parallels with their probabilistic counterparts.
arXiv Detail & Related papers (2025-11-26T09:53:28Z)
Towards the Formalization of a Trustworthy AI for Mining Interpretable Models explOiting Sophisticated Algorithms [4.587316936127635]
Interpretable-by-design models are crucial for fostering trust, accountability, and safe adoption of automated decision-making models in real-world applications.<n>We formalize a comprehensive methodology for generating predictive models that balance interpretability with performance.<n>By evaluating ethical measures during model generation, this framework establishes the theoretical foundations for developing AI systems.
arXiv Detail & Related papers (2025-10-23T14:54:33Z)
The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is a critical step in the NLP pipeline.<n>Despite its recognized importance as a standard representation method in NLP, the theoretical underpinnings of tokenization are not yet fully understood.<n>The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.
arXiv Detail & Related papers (2024-07-16T11:12:28Z)
Model Interpretation and Explainability: Towards Creating Transparency in Prediction Models [0.0]
Explainable AI (XAI) has a counterpart in analytical modeling which we refer to as model explainability. We analyze a dataset of loans from a credit card company and apply three stages: execute and compare four different prediction methods, apply the best known explainability techniques in the current literature to the model training sets to identify feature importance (FI) (static case) We found inconsistency in FI identification between the static and dynamic cases.
arXiv Detail & Related papers (2024-05-31T13:54:25Z)
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [92.61557711360652]
Language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. We conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement. We reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
arXiv Detail & Related papers (2023-10-12T17:51:10Z)
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model. We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Distributional Formal Semantics [0.18352113484137625]
We propose a Distributional Formal Semantics that integrates distributionality into a formal semantic system on the level of formal models. This approach offers probabilistic, distributed meaning representations that are also inherently compositional. We show how these representations allow for probabilistic inference, and how the information-theoretic notion of "information" naturally follows from it.
arXiv Detail & Related papers (2021-03-02T13:38:00Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.