Revealing Model Biases: Assessing Deep Neural Networks via Recovered
Sample Analysis
- URL: http://arxiv.org/abs/2306.06414v1
- Date: Sat, 10 Jun 2023 11:20:04 GMT
- Title: Revealing Model Biases: Assessing Deep Neural Networks via Recovered
Sample Analysis
- Authors: Mohammad Mahdi Mehmanchi, Mahbod Nouri, Mohammad Sabokrou
- Abstract summary: This paper proposes a straightforward and cost-effective approach to assess whether a deep neural network (DNN) relies on the primary concepts of training samples.
The proposed method does not require any test or generalization samples, only the parameters of the trained model and the training data that lie on the margin.
- Score: 9.05607520128194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a straightforward and cost-effective approach to assess
whether a deep neural network (DNN) relies on the primary concepts of training
samples or simply learns discriminative, yet simple and irrelevant features
that can differentiate between classes. The paper highlights that DNNs, as
discriminative classifiers, often find the simplest features to discriminate
between classes, leading to a potential bias towards irrelevant features and
sometimes missing generalization. While a generalization test is one way to
evaluate a trained model's performance, it can be costly and may not cover all
scenarios to ensure that the model has learned the primary concepts.
Furthermore, even after conducting a generalization test, identifying bias in
the model may not be possible. Here, the paper proposes a method that involves
recovering samples from the parameters of the trained model and analyzing the
reconstruction quality. We believe that if the model's weights are optimized to
discriminate based on some features, these features will be reflected in the
reconstructed samples. If the recovered samples contain the primary concepts of
the training data, it can be concluded that the model has learned the essential
and determining features. On the other hand, if the recovered samples contain
irrelevant features, it can be concluded that the model is biased towards these
features. The proposed method does not require any test or generalization
samples, only the parameters of the trained model and the training data that
lie on the margin. Our experiments demonstrate that the proposed method can
determine whether the model has learned the desired features of the training
data. The paper highlights that our understanding of how these models work is
limited, and the proposed approach addresses this issue.
Related papers
- A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework in Large Language Models (LLMs)
We derive novel metrics with high-probability guarantees concerning the output distribution of a model.
Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - On the Foundations of Shortcut Learning [20.53986437152018]
We study how predictivity and availability interact to shape models' feature use.
We find that linear models are relatively unbiased, but introducing a single hidden layer with ReLU or Tanh units yields a bias.
arXiv Detail & Related papers (2023-10-24T22:54:05Z) - Personalized Interpretable Classification [8.806213269230057]
We make a first step towards formally introducing personalized interpretable classification as a new data mining problem.
We conduct a series of empirical studies on real data sets.
Our algorithm can achieve the same-level predictive accuracy as those state-of-the-art (SOTA) interpretable classifiers.
arXiv Detail & Related papers (2023-02-06T01:59:16Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Provable Benefits of Overparameterization in Model Compression: From
Double Descent to Pruning Neural Networks [38.153825455980645]
Recent empirical evidence indicates that the practice of overization not only benefits training large models, but also assists - perhaps counterintuitively - building lightweight models.
This paper sheds light on these empirical findings by theoretically characterizing the high-dimensional toolsets of model pruning.
We analytically identify regimes in which, even if the location of the most informative features is known, we are better off fitting a large model and then pruning.
arXiv Detail & Related papers (2020-12-16T05:13:30Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Pair the Dots: Jointly Examining Training History and Test Stimuli for
Model Interpretability [44.60486560836836]
Any prediction from a model is made by a combination of learning history and test stimuli.
Existing methods to interpret a model's predictions are only able to capture a single aspect of either test stimuli or learning history.
We propose an efficient and differentiable approach to make it feasible to interpret a model's prediction by jointly examining training history and test stimuli.
arXiv Detail & Related papers (2020-10-14T10:45:01Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.