Related papers: Unifying Model Explainability and Robustness via Machine-Checkable Concepts

Unifying Model Explainability and Robustness via Machine-Checkable Concepts

URL: http://arxiv.org/abs/2007.00251v2
Date: Thu, 2 Jul 2020 07:33:15 GMT
Title: Unifying Model Explainability and Robustness via Machine-Checkable Concepts
Authors: Vedant Nanda, Till Speicher, John P. Dickerson, Krishna P. Gummadi, Muhammad Bilal Zafar
Abstract summary: We propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness. Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly.
Score: 33.88198813484126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As deep neural networks (DNNs) get adopted in an ever-increasing number of applications, explainability has emerged as a crucial desideratum for these models. In many real-world tasks, one of the principal reasons for requiring explainability is to in turn assess prediction robustness, where predictions (i.e., class labels) that do not conform to their respective explanations (e.g., presence or absence of a concept in the input) are deemed to be unreliable. However, most, if not all, prior methods for checking explanation-conformity (e.g., LIME, TCAV, saliency maps) require significant manual intervention, which hinders their large-scale deployability. In this paper, we propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the DNN explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness. Both steps are executed in an automated manner without requiring any human intervention and are easily scaled to datasets with a very large number of classes. Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly: the predictions marked to be robust by our framework have significantly higher accuracy and are more robust to adversarial perturbations.

Related papers

Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
When Can You Trust Your Explanations? A Robustness Analysis on Feature Importances [42.36530107262305]
robustness of explanations plays a central role in ensuring trust in both the system and the provided explanation. We propose a novel approach to analyse the robustness of neural network explanations to non-adversarial perturbations. We additionally present an ensemble method to aggregate various explanations, showing how merging explanations can be beneficial for both understanding the model's decision and evaluating the robustness.
arXiv Detail & Related papers (2024-06-20T14:17:57Z)
Explaining Deep Neural Networks for Bearing Fault Detection with Vibration Concepts [23.027545485830032]
We investigate how to leverage concept-based explanation techniques in the context of bearing fault detection with deep neural networks trained on vibration signals. Our evaluations demonstrate that explaining opaque models in terms of vibration concepts enables human-comprehensible and intuitive insights about their inner workings.
arXiv Detail & Related papers (2023-10-17T17:58:19Z)
Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well. We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z)
Generating robust counterfactual explanations [60.32214822437734]
The quality of a counterfactual depends on several criteria: realism, actionability, validity, robustness, etc. In this paper, we are interested in the notion of robustness of a counterfactual. More precisely, we focus on robustness to counterfactual input changes. We propose a new framework, CROCO, that generates robust counterfactuals while managing effectively this trade-off, and guarantees the user a minimal robustness.
arXiv Detail & Related papers (2023-04-24T09:00:31Z)
Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts. We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z)
Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
3D bounding boxes are a widespread intermediate representation in many computer vision applications. We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures. We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
arXiv Detail & Related papers (2022-10-13T23:57:40Z)
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics. Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets. Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z)
Reachable Sets of Classifiers and Regression Models: (Non-)Robustness Analysis and Robust Training [1.0878040851638]
We analyze and enhance robustness properties of both classifiers and regression models. Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks. Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking.
arXiv Detail & Related papers (2020-07-28T10:58:06Z)
How Much Can I Trust You? -- Quantifying Uncertainties in Explaining Neural Networks [19.648814035399013]
Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks. We propose a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation method for Bayesian neural networks. We demonstrate the effectiveness and usefulness of our approach extensively in various experiments.
arXiv Detail & Related papers (2020-06-16T08:54:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.