Test-time Augmentation for Factual Probing
- URL: http://arxiv.org/abs/2310.17121v1
- Date: Thu, 26 Oct 2023 03:41:32 GMT
- Title: Test-time Augmentation for Factual Probing
- Authors: Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui
- Abstract summary: A problem in factual probing is that small changes to the prompt can lead to large changes in model output.
We propose to use test-time augmentation (TTA) as a relation-agnostic method for reducing sensitivity to prompt variations.
- Score: 33.12189913850943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Factual probing is a method that uses prompts to test if a language model
"knows" certain world knowledge facts. A problem in factual probing is that
small changes to the prompt can lead to large changes in model output. Previous
work aimed to alleviate this problem by optimizing prompts via text mining or
fine-tuning. However, such approaches are relation-specific and do not
generalize to unseen relation types. Here, we propose to use test-time
augmentation (TTA) as a relation-agnostic method for reducing sensitivity to
prompt variations by automatically augmenting and ensembling prompts at test
time. Experiments show improved model calibration, i.e., with TTA, model
confidence better reflects prediction accuracy. Improvements in prediction
accuracy are observed for some models, but for other models, TTA leads to
degradation. Error analysis identifies the difficulty of producing high-quality
prompt variations as the main challenge for TTA.
Related papers
- Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching [7.837009376353597]
Test-time Adaptation(TTA) has been well studied because of its practicality.
We incorporate a new perspective on enhancing the input image into TTA methods to reduce the prediction's uncertainty.
We show that Test-time Enhancer and Adaptation(TECA) reduces prediction's uncertainty and increases accuracy of TTA methods.
arXiv Detail & Related papers (2024-03-26T06:40:03Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - Robust Question Answering against Distribution Shifts with Test-Time
Adaptation: An Empirical Study [24.34217596145152]
A deployed question answering (QA) model can easily fail when the test data has a distribution shift compared to the training data.
We evaluate test-time adaptation (TTA) to improve a model after deployment.
We also propose a novel TTA method called online imitation learning (OIL)
arXiv Detail & Related papers (2023-02-09T13:10:53Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Calibrate Before Use: Improving Few-Shot Performance of Language Models [68.17016463756474]
GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples.
We show that this type of few-shot learning can be unstable.
The choice of prompt format, training examples, and even the order of the training examples can cause accuracy to vary from near chance to near state-of-the-art.
arXiv Detail & Related papers (2021-02-19T00:23:59Z) - How Can We Know When Language Models Know? On the Calibration of
Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?"
We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated.
We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.