Improving Input-label Mapping with Demonstration Replay for In-context
Learning
- URL: http://arxiv.org/abs/2310.19572v1
- Date: Mon, 30 Oct 2023 14:29:41 GMT
- Title: Improving Input-label Mapping with Demonstration Replay for In-context
Learning
- Authors: Zhuocheng Gong, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai,
Dongyan Zhao, Rui Yan
- Abstract summary: In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
- Score: 67.57288926736923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) is an emerging capability of large autoregressive
language models where a few input-label demonstrations are appended to the
input to enhance the model's understanding of downstream NLP tasks, without
directly adjusting the model parameters. The effectiveness of ICL can be
attributed to the strong language modeling capabilities of large language
models (LLMs), which enable them to learn the mapping between input and labels
based on in-context demonstrations. Despite achieving promising results, the
causal nature of language modeling in ICL restricts the attention to be
backward only, i.e., a token only attends to its previous tokens, failing to
capture the full input-label information and limiting the model's performance.
In this paper, we propose a novel ICL method called Repeated Demonstration with
Sliding Causal Attention, (RdSca). Specifically, we duplicate later
demonstrations and concatenate them to the front, allowing the model to
`observe' the later information even under the causal restriction. Besides, we
introduce sliding causal attention, which customizes causal attention to avoid
information leakage. Experimental results show that our method significantly
improves the input-label mapping in ICL demonstrations. We also conduct an
in-depth analysis of how to customize the causal attention without training,
which has been an unexplored area in previous research.
Related papers
- Focused Large Language Models are Stable Many-Shot Learners [18.783939647966776]
In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations.
We propose a training-free method FocusICL, which conducts triviality filtering to avoid attention being diverted by unimportant contents.
We show that FocusICL achieves an average performance improvement of 5.2% over vanilla ICL and scales well with many-shot demonstrations.
arXiv Detail & Related papers (2024-08-26T02:53:24Z) - Demonstration Augmentation for Zero-shot In-context Learning [35.210664102352546]
Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL)
We propose Demonstration Augmentation for In-context Learning (DAIL), which employs the model's previously predicted historical samples as demonstrations for subsequent ones.
Our experiments reveal that DAIL can significantly improve the model's performance over direct zero-shot inference and can even outperform few-shot ICL without any external information.
arXiv Detail & Related papers (2024-06-03T11:46:42Z) - Rectifying Demonstration Shortcut in In-Context Learning [15.08431909212102]
Large language models (LLMs) are able to solve various tasks with only a few demonstrations utilizing their in-context learning (ICL) abilities.
LLMs often rely on their pre-trained semantic priors of demonstrations rather than on the input-label relationships to proceed with ICL prediction.
arXiv Detail & Related papers (2024-03-14T15:30:14Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - In-context Learning with Retrieved Demonstrations for Language Models: A Survey [23.24271704145876]
Few-shot in-context learners (ICL) are adept at adapting to new tasks with just a few demonstrations in the input context.
Instead of using a fixed set of demonstrations, one recent development is to retrieve demonstrations tailored to each input query.
We discuss and compare different design choices for retrieval models, retrieval training procedures, and inference algorithms.
arXiv Detail & Related papers (2024-01-21T23:34:42Z) - Dynamic Demonstrations Controller for In-Context Learning [51.3439660534631]
In-Context Learning (ICL) is a new paradigm for natural language processing (NLP), where a large language model observes a small number of demonstrations and a test instance as its input.
Previous studies have revealed that ICL is sensitive to the selection and the ordering of demonstrations.
We propose a Dynamic Demonstrations Controller (D$2$Controller), which can improve the ICL performance by adjusting the number of demonstrations.
arXiv Detail & Related papers (2023-09-30T14:04:22Z) - Self-ICL: Zero-Shot In-Context Learning with Self-Generated
Demonstrations [38.4166247280112]
Self-ICL is a framework which bootstraps LMs' intrinsic capabilities to perform zero-shot ICL.
Self-ICL outperforms zero-shot baselines on both average accuracy and head-to-head comparison.
arXiv Detail & Related papers (2023-05-24T11:22:34Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Explaining and Improving Model Behavior with k Nearest Neighbor
Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions.
We show that kNN representations are effective at uncovering learned spurious associations.
Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.