Test-Time Self-Adaptive Small Language Models for Question Answering
- URL: http://arxiv.org/abs/2310.13307v1
- Date: Fri, 20 Oct 2023 06:49:32 GMT
- Title: Test-Time Self-Adaptive Small Language Models for Question Answering
- Authors: Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park
- Abstract summary: We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data.
Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
- Score: 63.91013329169796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent instruction-finetuned large language models (LMs) have achieved
notable performances in various tasks, such as question-answering (QA).
However, despite their ability to memorize a vast amount of general knowledge
across diverse tasks, they might be suboptimal on specific tasks due to their
limited capacity to transfer and adapt knowledge to target tasks. Moreover,
further finetuning LMs with labeled datasets is often infeasible due to their
absence, but it is also questionable if we can transfer smaller LMs having
limited knowledge only with unlabeled test data. In this work, we show and
investigate the capabilities of smaller self-adaptive LMs, only with unlabeled
test data. In particular, we first stochastically generate multiple answers,
and then ensemble them while filtering out low-quality samples to mitigate
noise from inaccurate labels. Our proposed self-adaption strategy demonstrates
significant performance improvements on benchmark QA datasets with higher
robustness across diverse prompts, enabling LMs to stay stable. Code is
available at: https://github.com/starsuzi/T-SAS.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Elephants Never Forget: Testing Language Models for Memorization of
Tabular Data [21.912611415307644]
Large Language Models (LLMs) can be applied to a diverse set of tasks, but the critical issues of data contamination and memorization are often glossed over.
We introduce a variety of different techniques to assess the degrees of contamination, including statistical tests for conditional distribution modeling and four tests that identify memorization.
arXiv Detail & Related papers (2024-03-11T12:07:13Z) - Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA
Tasks? A: Self-Train on Unlabeled Images! [103.09776737512077]
SelTDA (Self-Taught Data Augmentation) is a strategy for finetuning large vision language models on small-scale VQA datasets.
It generates question-answer pseudolabels directly conditioned on an image, allowing us to pseudolabel unlabeled images.
We describe a series of experiments showing that our self-taught data augmentation increases robustness to adversarially searched questions.
arXiv Detail & Related papers (2023-06-06T18:00:47Z) - QActor: On-line Active Learning for Noisy Labeled Stream Data [10.814099534254922]
We propose QActor which combines the selection of supposedly clean samples via quality models and actively querying an oracle for the most informative true labels.
QActor swiftly combines the merits of quality models for data filtering and oracle queries for cleaning the most informative data.
A central feature of QActor is to dynamically adjust the query limit according to the learning loss for each data batch.
arXiv Detail & Related papers (2020-01-28T15:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.