Related papers: Intrinsic User-Centric Interpretability through Global Mixture of Experts

Intrinsic User-Centric Interpretability through Global Mixture of Experts

URL: http://arxiv.org/abs/2402.02933v4
Date: Wed, 28 May 2025 15:34:43 GMT
Title: Intrinsic User-Centric Interpretability through Global Mixture of Experts
Authors: Vinitra Swamy, Syrielle Montariol, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja Käser,
Abstract summary: InterpretCC is a family of intrinsically interpretable neural networks that optimize for ease of human understanding and explanation faithfulness.<n>We show that InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.
Score: 31.738009841932374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In human-centric settings like education or healthcare, model accuracy and model explainability are key factors for user adoption. Towards these two goals, intrinsically interpretable deep learning models have gained popularity, focusing on accurate predictions alongside faithful explanations. However, there exists a gap in the human-centeredness of these approaches, which often produce nuanced and complex explanations that are not easily actionable for downstream users. We present InterpretCC (interpretable conditional computation), a family of intrinsically interpretable neural networks at a unique point in the design space that optimizes for ease of human understanding and explanation faithfulness, while maintaining comparable performance to state-of-the-art models. InterpretCC achieves this through adaptive sparse activation of features before prediction, allowing the model to use a different, minimal set of features for each instance. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows users to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply InterpretCC for text, time series and tabular data across several real-world datasets, demonstrating comparable performance with non-interpretable baselines and outperforming intrinsically interpretable baselines. Through a user study involving 56 teachers, InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.

Related papers

Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability [1.8274323268621635]
Real Explainer (RealExp) is an interpretability method that decouples the Shapley Value into individual feature importance and feature correlation importance.<n>RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions.
arXiv Detail & Related papers (2024-12-02T10:50:50Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z)
Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language Models [53.337728969143086]
Recommendation systems harness user-item interactions like clicks and reviews to learn their representations. Previous studies improve recommendation accuracy and interpretability by modeling user preferences across various aspects and intents. We introduce a chain-based prompting approach to uncover semantic aspect-aware interactions.
arXiv Detail & Related papers (2023-12-26T15:44:09Z)
AS-XAI: Self-supervised Automatic Semantic Interpretation for CNN [5.42467030980398]
We propose a self-supervised automatic semantic interpretable artificial intelligence (AS-XAI) framework. It utilizes transparent embedding semantic extraction spaces and row-centered principal component analysis (PCA) for global semantic interpretation of model decisions. The proposed approach offers broad fine-grained practical applications, including shared semantic interpretation under out-of-distribution categories.
arXiv Detail & Related papers (2023-12-02T10:06:54Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
A Fine-grained Interpretability Evaluation Benchmark for Neural NLP [44.08113828762984]
This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension. We provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability.
arXiv Detail & Related papers (2022-05-23T07:37:04Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z)
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues. We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z)
Interpreting and improving deep-learning models with reality checks [13.287382944078562]
This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. We show how these attributions can be used to directly improve the generalization of a neural network or to distill it into a simple model.
arXiv Detail & Related papers (2021-08-16T00:58:15Z)
Interpretable Social Anchors for Human Trajectory Forecasting in Crowds [84.20437268671733]
We propose a neural network-based system to predict human trajectory in crowds. We learn interpretable rule-based intents, and then utilise the expressibility of neural networks to model scene-specific residual. Our architecture is tested on the interaction-centric benchmark TrajNet++.
arXiv Detail & Related papers (2021-05-07T09:22:34Z)
Model Learning with Personalized Interpretability Estimation (ML-PIE) [2.862606936691229]
High-stakes applications require AI-generated models to be interpretable. Current algorithms for the synthesis of potentially interpretable models rely on objectives or regularization terms. We propose an approach for the synthesis of models that are tailored to the user.
arXiv Detail & Related papers (2021-04-13T09:47:48Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
DoLFIn: Distributions over Latent Features for Interpretability [8.807587076209568]
We propose a novel strategy for achieving interpretability in neural network models. Our approach builds on the success of using probability as the central quantity. We show that DoLFIn not only provides interpretable solutions, but even slightly outperforms the classical CNN and BiLSTM text classification.
arXiv Detail & Related papers (2020-11-10T18:32:53Z)
A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers. A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z)
GAMI-Net: An Explainable Neural Network based on Generalized Additive Models with Structured Interactions [5.8010446129208155]
An explainable neural network based on generalized additive models with structured interactions (GAMI-Net) is proposed to pursue a good balance between prediction accuracy and model interpretability. GAMI-Net is a disentangled feedforward network with multiple additiveworks. Numerical experiments on both synthetic functions and real-world datasets show that the proposed model enjoys superior interpretability.
arXiv Detail & Related papers (2020-03-16T11:51:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.