Guide the Learner: Controlling Product of Experts Debiasing Method Based
on Token Attribution Similarities
- URL: http://arxiv.org/abs/2302.02852v1
- Date: Mon, 6 Feb 2023 15:21:41 GMT
- Title: Guide the Learner: Controlling Product of Experts Debiasing Method Based
on Token Attribution Similarities
- Authors: Ali Modarressi, Hossein Amirkhani, Mohammad Taher Pilehvar
- Abstract summary: A popular workaround is to train a robust model by re-weighting training examples based on a secondary biased model.
Here, the underlying assumption is that the biased model resorts to shortcut features.
We introduce a fine-tuning strategy that incorporates the similarity between the main and biased model attribution scores in a Product of Experts loss function.
- Score: 17.082695183953486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several proposals have been put forward in recent years for improving
out-of-distribution (OOD) performance through mitigating dataset biases. A
popular workaround is to train a robust model by re-weighting training examples
based on a secondary biased model. Here, the underlying assumption is that the
biased model resorts to shortcut features. Hence, those training examples that
are correctly predicted by the biased model are flagged as being biased and are
down-weighted during the training of the main model. However, assessing the
importance of an instance merely based on the predictions of the biased model
may be too naive. It is possible that the prediction of the main model can be
derived from another decision-making process that is distinct from the behavior
of the biased model. To circumvent this, we introduce a fine-tuning strategy
that incorporates the similarity between the main and biased model attribution
scores in a Product of Experts (PoE) loss function to further improve OOD
performance. With experiments conducted on natural language inference and fact
verification benchmarks, we show that our method improves OOD results while
maintaining in-distribution (ID) performance.
Related papers
- Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.
Yet their widespread adoption poses challenges regarding data attribution and interpretability.
In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Think Twice: Measuring the Efficiency of Eliminating Prediction
Shortcuts of Question Answering Models [3.9052860539161918]
We propose a simple method for measuring a scale of models' reliance on any identified spurious feature.
We assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA)
We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features.
arXiv Detail & Related papers (2023-05-11T14:35:00Z) - Predicting Out-of-Distribution Error with Confidence Optimal Transport [17.564313038169434]
We present a simple yet effective method to predict a model's performance on an unknown distribution without any addition annotation.
We show that our method, Confidence Optimal Transport (COT), provides robust estimates of a model's performance on a target domain.
Despite its simplicity, our method achieves state-of-the-art results on three benchmark datasets and outperforms existing methods by a large margin.
arXiv Detail & Related papers (2023-02-10T02:27:13Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Model-agnostic bias mitigation methods with regressor distribution
control for Wasserstein-based fairness metrics [0.6509758931804478]
We propose a bias mitigation methodology based upon the construction of post-processed models with fairer regressor distributions.
Our novel methodology performs optimization in low-dimensional spaces and avoids expensive model retraining.
arXiv Detail & Related papers (2021-11-19T17:31:22Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.