Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
- URL: http://arxiv.org/abs/2503.08640v2
- Date: Tue, 18 Mar 2025 17:13:42 GMT
- Title: Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
- Authors: Emily Xiao, Chin-Jou Li, Yilin Zhang, Graham Neubig, Amanda Bertsch,
- Abstract summary: Many-shot in-context learning has recently shown promise as an alternative to finetuning.<n>This shifts the computational burden from training-time to inference-time.<n>We present Dynamic Block-Sparse Attention, a training-free framework for retrieval-based many-shot in-context learning.
- Score: 45.20728476185864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many-shot in-context learning has recently shown promise as an alternative to finetuning, with the major advantage that the same model can be served for multiple tasks. However, this shifts the computational burden from training-time to inference-time, making deployment of many-shot ICL challenging to justify in-practice. This cost is further increased if a custom demonstration set is retrieved for each inference example. We present Dynamic Block-Sparse Attention, a training-free framework for retrieval-based many-shot in-context learning. By combining carefully designed block-sparse attention and retrieval of cached groups of demonstrations, we achieve comparable per-example latency to finetuning while maintaining on average >95% of the best method's accuracy across strong ICL and finetuning baselines. We hope that this will further enable the deployment of many-shot ICL at scale.
Related papers
- Localization-Aware Multi-Scale Representation Learning for Repetitive Action Counting [19.546761142820376]
Repetitive action counting (RAC) aims to estimate the number of class-agnostic action occurrences in a video without exemplars.
Most current RAC methods rely on a raw frame-to-frame similarity representation for period prediction.
We introduce a foreground localization objective into similarity representation learning to obtain more robust and efficient video features.
arXiv Detail & Related papers (2025-01-13T13:24:41Z) - More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives [50.772462704559345]
We introduce DrICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives.<n>Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels.<n>We develop the Many-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes.
arXiv Detail & Related papers (2025-01-07T14:57:08Z) - Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Anytime Continual Learning for Open Vocabulary Classification [15.228942895385432]
AnytimeCL problem aims to break away from batch training and rigid models.
We propose a dynamic weighting between predictions of a partially fine-tuned model and a fixed open vocabulary model.
Our methods are validated with experiments that test flexibility of learning and inference.
arXiv Detail & Related papers (2024-09-13T03:34:37Z) - SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots [9.048091324917515]
We present SeCoKD, a self-Knowledge Distillation ( KD ) training framework that aligns the student model with a heavily prompted variation.
We experiment with the SeCoKD across three Large Language Models (LLMs) and six benchmarks focusing mainly on reasoning tasks.
Results show that our method outperforms the base model and Supervised Fine-tuning ( SFT )
SeCoKD brings little negative artifacts when evaluated on new tasks, which is more robust than Supervised Fine-tuning.
arXiv Detail & Related papers (2024-06-20T11:26:06Z) - In-Context Learning with Long-Context Models: An In-Depth Exploration [92.16922648612807]
We show that, for many datasets with large label spaces, performance continues to increase with thousands of demonstrations.<n>We show that long-context ICL can be an effective tool, and may not require long-context for encoding the demonstration set at all.
arXiv Detail & Related papers (2024-04-30T21:06:52Z) - Many-Shot In-Context Learning [58.395589302800566]
Large language models (LLMs) excel at few-shot in-context learning (ICL)
We observe significant performance gains across a wide variety of generative and discriminative tasks.
Unlike few-shot learning, many-shot learning is effective at overriding pretraining biases.
arXiv Detail & Related papers (2024-04-17T02:49:26Z) - Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect.
This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Multimodal Parameter-Efficient Few-Shot Class Incremental Learning [1.9220716793379256]
Few-Shot Class Incremental Learning (FSCIL) is a challenging continual learning task, where limited training examples are available during several learning sessions.
To succeed in this task, it is necessary to avoid over-fitting new classes caused by biased distributions in the few-shot training sets.
CPE-CLIP significantly improves FSCIL performance compared to state-of-the-art proposals while also drastically reducing the number of learnable parameters and training costs.
arXiv Detail & Related papers (2023-03-08T17:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.