Using calibrator to improve robustness in Machine Reading Comprehension
- URL: http://arxiv.org/abs/2202.11865v1
- Date: Thu, 24 Feb 2022 02:16:42 GMT
- Title: Using calibrator to improve robustness in Machine Reading Comprehension
- Authors: Jing Jin and Houfeng Wang
- Abstract summary: We propose a method to improve the robustness by using a calibrator as the post-hoc reranker.
Experimental results on adversarial datasets show that our model can achieve performance improvement by more than 10%.
- Score: 18.844528744164876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Reading Comprehension(MRC) has achieved a remarkable result since
some powerful models, such as BERT, are proposed. However, these models are not
robust enough and vulnerable to adversarial input perturbation and
generalization examples. Some works tried to improve the performance on
specific types of data by adding some related examples into training data while
it leads to degradation on the original dataset, because the shift of data
distribution makes the answer ranking based on the softmax probability of model
unreliable. In this paper, we propose a method to improve the robustness by
using a calibrator as the post-hoc reranker, which is implemented based on
XGBoost model. The calibrator combines both manual features and representation
learning features to rerank candidate results. Experimental results on
adversarial datasets show that our model can achieve performance improvement by
more than 10\% and also make improvement on the original and generalization
datasets.
Related papers
- Self-calibration for Language Model Quantization and Pruning [38.00221764773372]
Quantization and pruning are fundamental approaches for model compression.
In a post-training setting, state-of-the-art quantization and pruning methods require calibration data.
We propose self-calibration as a solution.
arXiv Detail & Related papers (2024-10-22T16:50:00Z) - PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - Estimating Model Performance Under Covariate Shift Without Labels [9.804680621164168]
We introduce Probabilistic Adaptive Performance Estimation (PAPE) for evaluating classification models on unlabeled data.
PAPE provides more accurate performance estimates than other evaluated methodologies.
arXiv Detail & Related papers (2024-01-16T13:29:30Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Towards Continually Learning Application Performance Models [1.2278517240988065]
Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions.
Traditionally, these models assume that data distribution does not change as more samples are collected over time.
We develop continually learning performance models that account for the distribution drift, alleviate catastrophic forgetting, and improve generalizability.
arXiv Detail & Related papers (2023-10-25T20:48:46Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Feature Weaken: Vicinal Data Augmentation for Classification [1.7013938542585925]
We use Feature Weaken to construct the vicinal data distribution with the same cosine similarity for model training.
This work can not only improve the classification performance and generalization of the model, but also stabilize the model training and accelerate the model convergence.
arXiv Detail & Related papers (2022-11-20T11:00:23Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.