Unifying Model Explainability and Robustness via Machine-Checkable
Concepts
- URL: http://arxiv.org/abs/2007.00251v2
- Date: Thu, 2 Jul 2020 07:33:15 GMT
- Title: Unifying Model Explainability and Robustness via Machine-Checkable
Concepts
- Authors: Vedant Nanda, Till Speicher, John P. Dickerson, Krishna P. Gummadi,
Muhammad Bilal Zafar
- Abstract summary: We propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts.
Our framework defines a large number of concepts that the explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness.
Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly.
- Score: 33.88198813484126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As deep neural networks (DNNs) get adopted in an ever-increasing number of
applications, explainability has emerged as a crucial desideratum for these
models. In many real-world tasks, one of the principal reasons for requiring
explainability is to in turn assess prediction robustness, where predictions
(i.e., class labels) that do not conform to their respective explanations
(e.g., presence or absence of a concept in the input) are deemed to be
unreliable. However, most, if not all, prior methods for checking
explanation-conformity (e.g., LIME, TCAV, saliency maps) require significant
manual intervention, which hinders their large-scale deployability. In this
paper, we propose a robustness-assessment framework, at the core of which is
the idea of using machine-checkable concepts. Our framework defines a large
number of concepts that the DNN explanations could be based on and performs the
explanation-conformity check at test time to assess prediction robustness. Both
steps are executed in an automated manner without requiring any human
intervention and are easily scaled to datasets with a very large number of
classes. Experiments on real-world datasets and human surveys show that our
framework is able to enhance prediction robustness significantly: the
predictions marked to be robust by our framework have significantly higher
accuracy and are more robust to adversarial perturbations.
Related papers
- Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - Explaining Deep Neural Networks for Bearing Fault Detection with
Vibration Concepts [23.027545485830032]
We investigate how to leverage concept-based explanation techniques in the context of bearing fault detection with deep neural networks trained on vibration signals.
Our evaluations demonstrate that explaining opaque models in terms of vibration concepts enables human-comprehensible and intuitive insights about their inner workings.
arXiv Detail & Related papers (2023-10-17T17:58:19Z) - Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars.
Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well.
We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z) - Generating robust counterfactual explanations [60.32214822437734]
The quality of a counterfactual depends on several criteria: realism, actionability, validity, robustness, etc.
In this paper, we are interested in the notion of robustness of a counterfactual. More precisely, we focus on robustness to counterfactual input changes.
We propose a new framework, CROCO, that generates robust counterfactuals while managing effectively this trade-off, and guarantees the user a minimal robustness.
arXiv Detail & Related papers (2023-04-24T09:00:31Z) - Interpretable Self-Aware Neural Networks for Robust Trajectory
Prediction [50.79827516897913]
We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among semantic concepts.
We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-16T06:28:20Z) - Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
3D bounding boxes are a widespread intermediate representation in many computer vision applications.
We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures.
We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
arXiv Detail & Related papers (2022-10-13T23:57:40Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Reachable Sets of Classifiers and Regression Models: (Non-)Robustness
Analysis and Robust Training [1.0878040851638]
We analyze and enhance robustness properties of both classifiers and regression models.
Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks.
Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking.
arXiv Detail & Related papers (2020-07-28T10:58:06Z) - How Much Can I Trust You? -- Quantifying Uncertainties in Explaining
Neural Networks [19.648814035399013]
Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks.
We propose a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation method for Bayesian neural networks.
We demonstrate the effectiveness and usefulness of our approach extensively in various experiments.
arXiv Detail & Related papers (2020-06-16T08:54:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.