Related papers: Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example

Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example

URL: http://arxiv.org/abs/2002.10248v4
Date: Wed, 16 Dec 2020 16:44:55 GMT
Title: Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example
Authors: Serena Booth, Yilun Zhou, Ankit Shah, Julie Shah
Abstract summary: We introduce a flexible model inspection framework: Bayes-TrEx. Given a data distribution, Bayes-TrEx finds in-distribution examples with a specified prediction confidence. We show that this framework enables more flexible holistic model analysis than just inspecting the test set.
Score: 9.978961706999833
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Post-hoc explanation methods are gaining popularity for interpreting, understanding, and debugging neural networks. Most analyses using such methods explain decisions in response to inputs drawn from the test set. However, the test set may have few examples that trigger some model behaviors, such as high-confidence failures or ambiguous classifications. To address these challenges, we introduce a flexible model inspection framework: Bayes-TrEx. Given a data distribution, Bayes-TrEx finds in-distribution examples with a specified prediction confidence. We demonstrate several use cases of Bayes-TrEx, including revealing highly confident (mis)classifications, visualizing class boundaries via ambiguous examples, understanding novel-class extrapolation behavior, and exposing neural network overconfidence. We use Bayes-TrEx to study classifiers trained on CLEVR, MNIST, and Fashion-MNIST, and we show that this framework enables more flexible holistic model analysis than just inspecting the test set. Code is available at https://github.com/serenabooth/Bayes-TrEx.

Related papers

CAuSE: Decoding Multimodal Classifiers using Faithful Natural Language Explanation [46.9286703847151]
We propose CAuSE (Causal Abstraction under Simulated Explanations), a novel framework to generate faithful NLEs for any pretrained multimodal classifier.<n>We demonstrate that CAuSE generalizes across datasets and models through extensive empirical evaluations.<n>We further validate this through a redesigned metric for measuring causal faithfulness in multimodal settings.
arXiv Detail & Related papers (2025-12-07T12:15:21Z)
DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models [6.369258625916601]
Post-hoc interpretability methods fail to capture the models' decision-making process fully. Our paper introduces DISCO, a novel method for discovering global, rule-based explanations. DISCO supports interactive explanations, enabling human inspectors to distinguish spurious causes in the rule-based output.
arXiv Detail & Related papers (2024-11-07T12:12:44Z)
DOTA: Distributional Test-Time Adaptation of Vision-Language Models [52.98590762456236]
Training-free test-time dynamic adapter (TDA) is a promising approach to address this issue. We propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota) Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment.
arXiv Detail & Related papers (2024-09-28T15:03:28Z)
Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection [10.12283550685127]
We propose an Adapted-MoE to handle multiple distributions of same-category samples by divide and conquer. Specifically, we propose a routing network based on representation learning to route same-category samples into the subclasses feature space. We propose the test-time adaption to eliminate the bias between the unseen test sample representation and the feature distribution learned by the expert model.
arXiv Detail & Related papers (2024-09-09T13:49:09Z)
Boost Test-Time Performance with Closed-Loop Inference [85.43516360332646]
We propose to predict hard-classified test samples in a looped manner to boost the model performance. We first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops. For each hard sample, we construct an additional auxiliary learning task based on its original top-$K$ predictions to calibrate the model.
arXiv Detail & Related papers (2022-03-21T10:20:21Z)
Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification [86.32752788233913]
In classification problems, the Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance. We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show emphuncertainty of the classes. Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data.
arXiv Detail & Related papers (2022-02-01T13:22:26Z)
An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples. We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z)
Explaining and Improving Model Behavior with k Nearest Neighbor Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions. We show that kNN representations are effective at uncovering learned spurious associations. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability [44.60486560836836]
Any prediction from a model is made by a combination of learning history and test stimuli. Existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history. We propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli.
arXiv Detail & Related papers (2020-10-14T10:45:01Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Learning to Faithfully Rationalize by Construction [36.572594249534866]
In many settings it is important to be able to understand why a model made a particular prediction. We propose a simpler variant of this approach that provides faithful explanations by construction. In both automatic and manual evaluations we find that variants of this simple framework yield superior to end-to-end' approaches.
arXiv Detail & Related papers (2020-04-30T21:45:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.