Adaptive loose optimization for robust question answering
- URL: http://arxiv.org/abs/2305.03971v3
- Date: Tue, 31 Oct 2023 01:45:39 GMT
- Title: Adaptive loose optimization for robust question answering
- Authors: Jie Ma, Pinghui Wang, Zewei Wang, Dechen Kong, Min Hu, Ting Han, Jun
Liu
- Abstract summary: We propose a simple yet effective novel loss function with adaptive loose optimization.
Our main technical contribution is to reduce the loss adaptively according to the ratio between the previous and current optimization state.
Our approach enables QA methods to obtain state-of-the-art in- and out-of-distribution performance in most cases.
- Score: 21.166930242285446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question answering methods are well-known for leveraging data bias, such as
the language prior in visual question answering and the position bias in
machine reading comprehension (extractive question answering). Current
debiasing methods often come at the cost of significant in-distribution
performance to achieve favorable out-of-distribution generalizability, while
non-debiasing methods sacrifice a considerable amount of out-of-distribution
performance in order to obtain high in-distribution performance. Therefore, it
is challenging for them to deal with the complicated changing real-world
situations. In this paper, we propose a simple yet effective novel loss
function with adaptive loose optimization, which seeks to make the best of both
worlds for question answering. Our main technical contribution is to reduce the
loss adaptively according to the ratio between the previous and current
optimization state on mini-batch training data. This loose optimization can be
used to prevent non-debiasing methods from overlearning data bias while
enabling debiasing methods to maintain slight bias learning. Experiments on the
visual question answering datasets, including VQA v2, VQA-CP v1, VQA-CP v2,
GQA-OOD, and the extractive question answering dataset SQuAD demonstrate that
our approach enables QA methods to obtain state-of-the-art in- and
out-of-distribution performance in most cases. The source code has been
released publicly in \url{https://github.com/reml-group/ALO}.
Related papers
- Forecasting Outside the Box: Application-Driven Optimal Pointwise Forecasts for Stochastic Optimization [0.0]
We present an integrated learning and optimization procedure that yields the best approximation of an unknown situation.
Numerical results conducted with inventory problems from the literature as well as a bike-sharing problem with real data demonstrate that the proposed approach performs well.
arXiv Detail & Related papers (2024-11-05T21:54:50Z) - Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning [0.6963971634605796]
We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning.
Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble.
arXiv Detail & Related papers (2024-05-14T00:57:02Z) - Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
Beyond [93.96982273042296]
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions.
We have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding.
We propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data.
We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.
arXiv Detail & Related papers (2023-10-23T08:09:42Z) - Kernel-Whitening: Overcome Dataset Bias with Isotropic Sentence
Embedding [51.48582649050054]
We propose a representation normalization method which aims at disentangling the correlations between features of encoded sentences.
We also propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations.
Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
arXiv Detail & Related papers (2022-10-14T05:56:38Z) - Introspective Distillation for Robust Question Answering [70.18644911309468]
Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension.
Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance.
We present a novel debiasing method called Introspective Distillation (IntroD) to make the best of both worlds for QA.
arXiv Detail & Related papers (2021-11-01T15:30:15Z) - Greedy Gradient Ensemble for Robust Visual Question Answering [163.65789778416172]
We stress the language bias in Visual Question Answering (VQA) that comes from two aspects, i.e., distribution bias and shortcut bias.
We propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning.
GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models.
arXiv Detail & Related papers (2021-07-27T08:02:49Z) - Optimal Resource Allocation for Serverless Queries [8.59568779761598]
Prior work focused on predicting peak allocation while ignoring aggressive trade-offs between resource allocation and run-time.
We introduce a system for optimal resource allocation that can predict performance with aggressive trade-offs, for both new and past observed queries.
arXiv Detail & Related papers (2021-07-19T02:55:48Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - Unshuffling Data for Improved Generalization [65.57124325257409]
Generalization beyond the training distribution is a core challenge in machine learning.
We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization.
arXiv Detail & Related papers (2020-02-27T03:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.