Related papers: The Dead Salmons of AI Interpretability

The Dead Salmons of AI Interpretability

URL: http://arxiv.org/abs/2512.18792v1
Date: Sun, 21 Dec 2025 16:07:44 GMT
Title: The Dead Salmons of AI Interpretability
Authors: Maxime Méloux, Giada Dirupo, François Portet, Maxime Peyrard,
Abstract summary: In AI interpretability, reports of similar ''dead salmon'' artifacts abound.<n>We argue for a pragmatic statistical-causal reframing.
Score: 9.722180905657268
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In a striking neuroscience study, the authors placed a dead salmon in an MRI scanner and showed it images of humans in social situations. Astonishingly, standard analyses of the time reported brain regions predictive of social emotions. The explanation, of course, was not supernatural cognition but a cautionary tale about misapplied statistical inference. In AI interpretability, reports of similar ''dead salmon'' artifacts abound: feature attribution, probing, sparse auto-encoding, and even causal analyses can produce plausible-looking explanations for randomly initialized neural networks. In this work, we examine this phenomenon and argue for a pragmatic statistical-causal reframing: explanations of computational systems should be treated as parameters of a (statistical) model, inferred from computational traces. This perspective goes beyond simply measuring statistical variability of explanations due to finite sampling of input data; interpretability methods become statistical estimators, and findings should be tested against explicit and meaningful alternative computational hypotheses, with uncertainty quantified with respect to the postulated statistical model. It also highlights important theoretical issues, such as the identifiability of common interpretability queries, which we argue is critical to understand the field's susceptibility to false discoveries, poor generalizability, and high variance. More broadly, situating interpretability within the standard toolkit of statistical inference opens promising avenues for future work aimed at turning AI interpretability into a pragmatic and rigorous science.

Related papers

Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data [0.8415089854734883]
We show that high predictive accuracy does not guarantee valid downstream inference.<n>We show that all such failures reduce to statistical notions of (i) bias, when predictions systematically shift the estimand or distort relationships among variables, and (ii) variance, when uncertainty from the prediction model and the intrinsic variability of the true data are ignored.
arXiv Detail & Related papers (2025-12-05T06:24:23Z)
Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations [7.305342793164905]
We investigate scenarios where human intuition might surpass idealized statistical rationality. Our findings suggest that analyst gut reactions to visualizations may provide an advantage, even when departing from rationality.
arXiv Detail & Related papers (2024-07-23T22:39:57Z)
Bayesian Networks for Causal Analysis in Socioecological Systems [0.3495246564946556]
Causal and counterfactual reasoning are emerging directions in data science.<n>Main contribution of this paper is to analyze the relations of necessity and sufficiency between the variables of a socioecological system.<n>In particular, we consider a case study involving socioeconomic factors and land-uses in southern Spain.
arXiv Detail & Related papers (2024-01-18T16:10:07Z)
Reliability and Interpretability in Science and Deep Learning [0.0]
This article focuses on the comparison between traditional scientific models and Deep Neural Network (DNN) models. It argues that the high complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress. It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone.
arXiv Detail & Related papers (2024-01-14T20:14:07Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models. We prove the first results that allow a non-parametric decomposition of spurious effects. The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z)
Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z)
Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data [65.28160163774274]
We apply a Bayesian framework to capture the relationships between depression, depression symptoms, and features derived from speech, facial expression and cognitive game data collected at thymia.
arXiv Detail & Related papers (2022-11-09T14:48:13Z)
Neural Causal Models for Counterfactual Identification and Estimation [62.30444687707919]
We study the evaluation of counterfactual statements through neural models. First, we show that neural causal models (NCMs) are expressive enough. Second, we develop an algorithm for simultaneously identifying and estimating counterfactual distributions.
arXiv Detail & Related papers (2022-09-30T18:29:09Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning [17.336655978572583]
Recent concerns that machine learning (ML) may be facing a misdiagnosis and replication crisis suggest that some published claims in ML research cannot be taken at face value. A deeper understanding of what concerns in research in supervised ML have in common with the replication crisis in experimental science can put the new concerns in perspective.
arXiv Detail & Related papers (2022-03-12T18:26:24Z)
ACRE: Abstract Causal REasoning Beyond Covariation [90.99059920286484]
We introduce the Abstract Causal REasoning dataset for systematic evaluation of current vision systems in causal induction. Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario. We notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning.
arXiv Detail & Related papers (2021-03-26T02:42:38Z)
Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability [30.501012698482423]
There has been no formal study of the statistical cost of interpretability in machine learning. We model the act of enforcing interpretability as that of performing empirical risk minimization over the set of interpretable hypotheses. We perform a case analysis, explaining why one may or may not observe a trade-off between accuracy and interpretability when the restriction to interpretable classifiers does or does not come at the cost of some excess statistical risk.
arXiv Detail & Related papers (2020-10-26T17:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.