Related papers: Enhancing In-Context Learning via Implicit Demonstration Augmentation

Enhancing In-Context Learning via Implicit Demonstration Augmentation

URL: http://arxiv.org/abs/2407.00100v1
Date: Thu, 27 Jun 2024 05:25:46 GMT
Title: Enhancing In-Context Learning via Implicit Demonstration Augmentation
Authors: Xiaoling Zhou, Wei Ye, Yidong Wang, Chaoya Jiang, Zhemg Lee, Rui Xie, Shikun Zhang,
Abstract summary: In-context learning (ICL) enables pre-trained language models to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations. In this paper, we tackle this challenge for the first time from the perspective of demonstration augmentation.
Score: 26.78252788538567
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The emergence of in-context learning (ICL) enables large pre-trained language models (PLMs) to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from the perspective of demonstration augmentation. Specifically, we start with enriching representations of demonstrations by leveraging their deep feature distribution. We then theoretically reveal that when the number of augmented copies approaches infinity, the augmentation is approximately equal to a novel logit calibration mechanism integrated with specific statistical properties. This insight results in a simple yet highly efficient method that significantly improves the average and worst-case accuracy across diverse PLMs and tasks. Moreover, our method effectively reduces performance variance among varying demonstrations, permutations, and templates, and displays the capability to address imbalanced class distributions.

Related papers

WSS-CL: Weight Saliency Soft-Guided Contrastive Learning for Efficient Machine Unlearning Image Classification [0.0]
We introduce a new two-phase efficient machine unlearning method for image classification, in terms of weight saliency.<n>Our method is called weight saliency soft-guided contrastive learning for efficient machine unlearning image classification (WSS-CL)<n>Our proposed method yields much-improved unlearning efficacy with negligible performance loss compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-08-06T10:47:36Z)
Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training [57.03005244917803]
Large language models (LLMs) often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training.<n>Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT)<n> Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks.
arXiv Detail & Related papers (2025-06-11T06:30:28Z)
Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization [31.379237532476875]
Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks.<n>In this paper, we propose a multi-stage influence function to attribute predictions of fine-tuned LLMs to pre-training data.
arXiv Detail & Related papers (2025-05-08T07:43:44Z)
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning [35.62208317531141]
We advocate and introduce the unrolling paradigm, also referred to as "learning to optimize" Our unrolling approach covers various statistical feature distributions and pre-training paradigms. We report comprehensive experiments, which cover a breadth of fine-grained downstream image classification tasks.
arXiv Detail & Related papers (2024-12-21T19:01:57Z)
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks. To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z)
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters. We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL. We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z)
Enhancing Vision-Language Few-Shot Adaptation with Negative Learning [11.545127156146368]
We propose a Simple yet effective Negative Learning approach, SimNL, to more efficiently exploit task-specific knowledge. To this issue, we introduce a plug-and-play few-shot instance reweighting technique to mitigate noisy outliers. Our extensive experimental results validate that the proposed SimNL outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks.
arXiv Detail & Related papers (2024-03-19T17:59:39Z)
Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing. Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image. To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z)
The Common Stability Mechanism behind most Self-Supervised Learning Approaches [64.40701218561921]
We provide a framework to explain the stability mechanism of different self-supervised learning techniques. We discuss the working mechanism of contrastive techniques like SimCLR, non-contrastive techniques like BYOL, SWAV, SimSiam, Barlow Twins, and DINO. We formulate different hypotheses and test them using the Imagenet100 dataset.
arXiv Detail & Related papers (2024-02-22T20:36:24Z)
Improving Input-label Mapping with Demonstration Replay for In-context Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models. We propose a novel ICL method called Sliding Causal Attention (RdSca) We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z)
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff. cutoff relies on sampling consistency and thus adds little computational overhead. cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.