Related papers: Enhancing Performance of Explainable AI Models with Constrained Concept Refinement

Enhancing Performance of Explainable AI Models with Constrained Concept Refinement

URL: http://arxiv.org/abs/2502.06775v1
Date: Mon, 10 Feb 2025 18:53:15 GMT
Title: Enhancing Performance of Explainable AI Models with Constrained Concept Refinement
Authors: Geyu Liang, Senne Michielssen, Salar Fattahi,
Abstract summary: Trade-off between accuracy and interpretability has long been a challenge in machine learning (ML)<n>In this paper, we investigate the impact of deviations in concept representations and propose a novel framework to mitigate these effects.<n>Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost.
Score: 10.241134756773228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The trade-off between accuracy and interpretability has long been a challenge in machine learning (ML). This tension is particularly significant for emerging interpretable-by-design methods, which aim to redesign ML algorithms for trustworthy interpretability but often sacrifice accuracy in the process. In this paper, we address this gap by investigating the impact of deviations in concept representations-an essential component of interpretable models-on prediction performance and propose a novel framework to mitigate these effects. The framework builds on the principle of optimizing concept embeddings under constraints that preserve interpretability. Using a generative model as a test-bed, we rigorously prove that our algorithm achieves zero loss while progressively enhancing the interpretability of the resulting model. Additionally, we evaluate the practical performance of our proposed framework in generating explainable predictions for image classification tasks across various benchmarks. Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost.

Related papers

Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution. Our contribution is to provide the first general estimation technique for transportability problems. We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
Adaptive Test-Time Intervention for Concept Bottleneck Models [6.31833744906105]
Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD) FIGS-BD distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model.
arXiv Detail & Related papers (2025-03-09T19:03:48Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Fast Explanations via Policy Gradient-Optimized Explainer [7.011763596804071]
This paper introduces a novel framework that represents attribution-based explanations via probability distributions.<n>The proposed framework offers a robust, scalable solution for real-time, large-scale model explanations.<n>We validate our framework on image and text classification tasks and the experiments demonstrate that our method reduces inference time by over 97% and memory usage by 70%.
arXiv Detail & Related papers (2024-05-29T00:01:40Z)
Benchmarking and Enhancing Disentanglement in Concept-Residual Models [4.177318966048984]
Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features. CBMs' performance depends on the engineered features and can severely suffer from incomplete sets of concepts. This work proposes three novel approaches to mitigate information leakage by disentangling concepts and residuals.
arXiv Detail & Related papers (2023-11-30T21:07:26Z)
Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory. We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model. Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z)
Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
We propose a complete framework for extending concept-based interpretability methods to NLP. We optimize for features whose existence causes the output predictions to change substantially. Our method achieves superior results on predictive impact, usability, and faithfulness compared to the baselines.
arXiv Detail & Related papers (2023-05-03T14:48:27Z)
Explain, Adapt and Retrain: How to improve the accuracy of a PPM classifier through different explanation styles [4.6281736192809575]
Recent papers have introduced a novel approach to explain why a Predictive Process Monitoring model for outcome-oriented predictions provides wrong predictions. We show how to exploit the explanations to identify the most common features that induce a predictor to make mistakes in a semi-automated way.
arXiv Detail & Related papers (2023-03-27T06:37:55Z)
Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution. We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z)
Robust Semantic Interpretability: Revisiting Concept Activation Vectors [0.0]
Interpretability methods for image classification attempt to expose whether the model is systematically biased or attending to the same cues as a human would. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole.
arXiv Detail & Related papers (2021-04-06T20:14:59Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Towards a Theoretical Understanding of the Robustness of Variational Autoencoders [82.68133908421792]
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations. We develop a novel criterion for robustness in probabilistic models: $r$-robustness. We show that VAEs trained using disentangling methods score well under our robustness metrics.
arXiv Detail & Related papers (2020-07-14T21:22:29Z)
Efficient Ensemble Model Generation for Uncertainty Estimation with Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models. In the proposed method, ensemble models can be efficiently generated by using the layer selection method. We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.