A Framework for Causal Concept-based Model Explanations
- URL: http://arxiv.org/abs/2512.02735v1
- Date: Tue, 02 Dec 2025 13:19:53 GMT
- Title: A Framework for Causal Concept-based Model Explanations
- Authors: Anna Rodum Bjøru, Jacob Lysnæs-Larsen, Oskar Jørgensen, Inga Strümke, Helge Langseth,
- Abstract summary: This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI)<n>It is based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.
Related papers
- From Black-box to Causal-box: Towards Building More Interpretable Models [57.23201263629627]
We introduce the notion of causal interpretability, which formalizes when counterfactual queries can be evaluated from a specific class of models.<n>We derive a complete graphical criterion that determines whether a given model architecture supports a given counterfactual query.
arXiv Detail & Related papers (2025-10-24T20:03:18Z) - Logic Explanation of AI Classifiers by Categorical Explaining Functors [5.311276815905217]
We propose a theoretically grounded approach to ensure coherence and fidelity of extracted explanations.<n>As a proof of concept, we validate the proposed theoretical constructions on a synthetic benchmark.
arXiv Detail & Related papers (2025-03-20T14:50:06Z) - On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios [46.24262986854885]
We propose a novel framework for generating probabilistic monolithic explanations and model reconciling explanations.<n>For monolithic explanations, our approach integrates uncertainty by utilizing probabilistic logic to increase the probability of the explanandum.<n>For model reconciling explanations, we propose a framework that extends the logic-based variant of the model reconciliation problem to account for probabilistic human models.
arXiv Detail & Related papers (2024-05-29T16:07:31Z) - DiConStruct: Causal Concept-based Explanations through Black-Box
Distillation [9.735426765564474]
We present DiConStruct, an explanation method that is both concept-based and causal.
Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations.
arXiv Detail & Related papers (2024-01-16T17:54:02Z) - Do Concept Bottleneck Models Respect Localities? [14.77558378567965]
Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models.<n>We assess whether concept predictors leverage "relevant" features to make predictions, a term we call locality.<n>We find that many concept-based models used in practice fail to respect localities because concept predictors cannot always clearly distinguish distinct concepts.
arXiv Detail & Related papers (2024-01-02T16:05:23Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Explaining Causal Models with Argumentation: the Case of Bi-variate
Reinforcement [15.947501347927687]
We introduce a conceptualisation for generating argumentation frameworks (AFs) from causal models.
The conceptualisation is based on reinterpreting desirable properties of semantics of AFs as explanation moulds.
We perform a theoretical evaluation of these argumentative explanations, examining whether they satisfy a range of desirable explanatory and argumentative properties.
arXiv Detail & Related papers (2022-05-23T19:39:51Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.