DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase
- URL: http://arxiv.org/abs/2311.03319v1
- Date: Mon, 6 Nov 2023 18:12:55 GMT
- Title: DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase
- Authors: Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin wang, Xueqi
Wang, William Hogan, Jingbo Shang
- Abstract summary: In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks.
We propose textbfData textbfAugmentation for textbfIn-Context textbfLearning (textbfDAIL)
- Score: 37.68804898063595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-Context Learning (ICL) combined with pre-trained large language models has
achieved promising results on various NLP tasks. However, ICL requires
high-quality annotated demonstrations which might not be available in
real-world scenarios. To overcome this limitation, we propose \textbf{D}ata
\textbf{A}ugmentation for \textbf{I}n-Context \textbf{L}earning
(\textbf{DAIL}). DAIL leverages the intuition that large language models are
more familiar with the content generated by themselves. It first utilizes the
language model to generate paraphrases of the test sample and employs majority
voting to determine the final result based on individual predictions. Our
extensive empirical evaluation shows that DAIL outperforms the standard ICL
method and other ensemble-based methods in the low-resource scenario.
Additionally, we explore the use of voting consistency as a confidence score of
the model when the logits of predictions are inaccessible. We believe our work
will stimulate further research on ICL in low-resource settings.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs [1.515687944002438]
We propose Contrastive Semantic Similarity, a module to obtain similarity features for measuring uncertainty for text pairs.
We conduct extensive experiments with three large language models (LLMs) on several benchmark question-answering datasets.
Results show that our proposed method performs better in estimating reliable responses of LLMs than comparable baselines.
arXiv Detail & Related papers (2024-06-05T11:35:44Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability.
We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples.
We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - A Unified Neural Network Model for Readability Assessment with Feature
Projection and Length-Balanced Loss [17.213602354715956]
We propose a BERT-based model with feature projection and length-balanced loss for readability assessment.
Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks.
arXiv Detail & Related papers (2022-10-19T05:33:27Z) - Coreferential Reasoning Learning for Language Representation [88.14248323659267]
We present CorefBERT, a novel language representation model that can capture the coreferential relations in context.
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks.
arXiv Detail & Related papers (2020-04-15T03:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.