Related papers: Counterfactual explainability and analysis of variance

Related papers

Explanation Beyond Intuition: A Testable Criterion for Inherent Explainability [0.2580765958706854]
Inherent explainability is the gold standard in Explainable Artificial Intelligence (XAI)<n>There is not a consistent definition or test to demonstrate inherent explainability.<n>We propose a globally applicable criterion for inherent explainability.
arXiv Detail & Related papers (2025-12-19T07:59:36Z)
Loss-Complexity Landscape and Model Structure Functions [53.92822954974537]
We develop a framework for dualizing the Kolmogorov structure function $h_x(alpha)$.<n>We establish a mathematical analogy between information-theoretic constructs and statistical mechanics.<n>We explicitly prove the Legendre-Fenchel duality between the structure function and free energy.
arXiv Detail & Related papers (2025-07-17T21:31:45Z)
How do Transformers Learn Implicit Reasoning? [67.02072851088637]
We study how implicit multi-hop reasoning emerges by training transformers from scratch in a controlled symbolic environment.<n>We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures.
arXiv Detail & Related papers (2025-05-29T17:02:49Z)
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Hard to Explain: On the Computational Hardness of In-Distribution Model Interpretation [0.9558392439655016]
The ability to interpret Machine Learning (ML) models is becoming increasingly essential. Recent work has demonstrated that it is possible to formally assess interpretability by studying the computational complexity of explaining the decisions of various models.
arXiv Detail & Related papers (2024-08-07T17:20:52Z)
On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios [46.24262986854885]
We propose a novel framework for generating probabilistic monolithic explanations and model reconciling explanations.<n>For monolithic explanations, our approach integrates uncertainty by utilizing probabilistic logic to increase the probability of the explanandum.<n>For model reconciling explanations, we propose a framework that extends the logic-based variant of the model reconciliation problem to account for probabilistic human models.
arXiv Detail & Related papers (2024-05-29T16:07:31Z)
Identifiable Latent Neural Causal Models [82.14087963690561]
Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. We determine the types of distribution shifts that do contribute to the identifiability of causal representations. We translate our findings into a practical algorithm, allowing for the acquisition of reliable latent causal representations.
arXiv Detail & Related papers (2024-03-23T04:13:55Z)
Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors. We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model. We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z)
Explanatory causal effects for model agnostic explanations [27.129579550130423]
We study the problem of estimating the contributions of features to the prediction of a specific instance by a machine learning model. A challenge is that most existing causal effects cannot be estimated from data without a known causal graph. We define an explanatory causal effect based on a hypothetical ideal experiment.
arXiv Detail & Related papers (2022-06-23T08:25:31Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
Explaining Causal Models with Argumentation: the Case of Bi-variate Reinforcement [15.947501347927687]
We introduce a conceptualisation for generating argumentation frameworks (AFs) from causal models. The conceptualisation is based on reinterpreting desirable properties of semantics of AFs as explanation moulds. We perform a theoretical evaluation of these argumentative explanations, examining whether they satisfy a range of desirable explanatory and argumentative properties.
arXiv Detail & Related papers (2022-05-23T19:39:51Z)
Model Explanations via the Axiomatic Causal Lens [9.915489218010952]
We propose three explanation measures which aggregate the set of all but-for causes into feature importance weights. Our first measure is a natural adaptation of Chockler and Halpern's notion of causal responsibility. We extend our approach to derive a new method to compute the Shapley-Shubik and Banzhaf indices for black-box model explanations.
arXiv Detail & Related papers (2021-09-08T19:33:52Z)
Estimation of Bivariate Structural Causal Models by Variational Gaussian Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models. One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z)
Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models. Our method is based on projecting model representation to a latent space. Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z)
The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models. We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning [9.279259759707996]
Causal approaches to post-hoc explainability for black-box prediction models have become increasingly popular. We learn causal graphical representations that allow for arbitrary unmeasured confounding among features. Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are "difference-makers" in an interventionist sense.
arXiv Detail & Related papers (2020-06-03T19:02:34Z)
CausalVAE: Structured Causal Disentanglement in Variational Autoencoder [52.139696854386976]
The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations. We propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent factors into causal endogenous ones. Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy.
arXiv Detail & Related papers (2020-04-18T20:09:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.