Related papers: KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning

KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning

URL: http://arxiv.org/abs/2509.15676v1
Date: Fri, 19 Sep 2025 06:50:03 GMT
Title: KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning
Authors: Vaibhav Singh, Soumya Suvra Ghosal, Kapu Nirmal Joshua, Soumyabrata Pal, Sayak Ray Chowdhury,
Abstract summary: In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models to new and data-scarce tasks.<n>We study the problem of example selection in ICL from a principled, information theory-driven perspective.<n>We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee.
Score: 30.471243464952625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models (LLMs) to new and data-scarce tasks using only a few carefully selected task-specific examples presented in the prompt. However, given the limited context size of LLMs, a fundamental question arises: Which examples should be selected to maximize performance on a given user query? While nearest-neighbor-based methods like KATE have been widely adopted for this purpose, they suffer from well-known drawbacks in high-dimensional embedding spaces, including poor generalization and a lack of diversity. In this work, we study this problem of example selection in ICL from a principled, information theory-driven perspective. We first model an LLM as a linear function over input embeddings and frame the example selection task as a query-specific optimization problem: selecting a subset of exemplars from a larger example bank that minimizes the prediction error on a specific query. This formulation departs from traditional generalization-focused learning theoretic approaches by targeting accurate prediction for a specific query instance. We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee. We further enhance our method by (i) incorporating the kernel trick to operate in high-dimensional feature spaces without explicit mappings, and (ii) introducing an optimal design-based regularizer to encourage diversity in the selected examples. Empirically, we demonstrate significant improvements over standard retrieval methods across a suite of classification tasks, highlighting the benefits of structure-aware, diverse example selection for ICL in real-world, label-scarce scenarios.

Related papers

A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't) [14.070675074621043]
Instruction fine-tuning involves selecting a subset of instruction training data from a large candidate pool, using a small query set from the target task.<n>Despite growing interest, the literature on targeted instruction selection remains fragmented and opaque.<n>In this work, we aim to bring clarity to this landscape by disentangling and systematically analyzing the two core ingredients: data representation and selection algorithms.
arXiv Detail & Related papers (2026-02-16T12:33:05Z)
Nearly Optimal Active Preference Learning and Its Application to LLM Alignment [68.56793807995417]
Aligning large language models depends on high-quality datasets of human preference labels.<n>Many existing approaches adopt classical experimental design criteria such as G- or D-optimality.<n>In this work, we identify a simple intuition specific to preference learning that calls into question the suitability of these existing design objectives.
arXiv Detail & Related papers (2026-02-02T03:21:29Z)
Order Matters: Rethinking Prompt Construction in In-Context Learning [52.19217980839306]
In-context learning (ICL) enables large language models to perform new tasks by conditioning on a sequence of examples.<n>Most prior work assumes that which examples are chosen has a far greater effect on performance than how those examples are ordered.<n>We revisit this assumption and conduct a systematic comparison between the effect of selection and ordering.
arXiv Detail & Related papers (2025-11-12T19:57:55Z)
Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers [22.72866404096086]
Amortized learning is the idea of reusing computation or inductive biases shared across tasks to enable rapid generalization to novel problems.<n>Current approaches struggle to scale to large datasets because their capacity to process task data at inference is often limited.<n>We propose iterative amortized inference, a class of models that refine solutions step-by-step over mini-batches.
arXiv Detail & Related papers (2025-10-13T14:40:47Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
Experimental Design for Active Transductive Inference in Large Language Models [18.2671641610825]
We use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD) We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
arXiv Detail & Related papers (2024-04-12T23:27:46Z)
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z)
Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages. First we filter the dataset to obtain informative in-context examples individually. Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z)
Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability. We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples. We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z)
An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.