KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning
- URL: http://arxiv.org/abs/2509.15676v1
- Date: Fri, 19 Sep 2025 06:50:03 GMT
- Title: KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning
- Authors: Vaibhav Singh, Soumya Suvra Ghosal, Kapu Nirmal Joshua, Soumyabrata Pal, Sayak Ray Chowdhury,
- Abstract summary: In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models to new and data-scarce tasks.<n>We study the problem of example selection in ICL from a principled, information theory-driven perspective.<n>We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee.
- Score: 30.471243464952625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models (LLMs) to new and data-scarce tasks using only a few carefully selected task-specific examples presented in the prompt. However, given the limited context size of LLMs, a fundamental question arises: Which examples should be selected to maximize performance on a given user query? While nearest-neighbor-based methods like KATE have been widely adopted for this purpose, they suffer from well-known drawbacks in high-dimensional embedding spaces, including poor generalization and a lack of diversity. In this work, we study this problem of example selection in ICL from a principled, information theory-driven perspective. We first model an LLM as a linear function over input embeddings and frame the example selection task as a query-specific optimization problem: selecting a subset of exemplars from a larger example bank that minimizes the prediction error on a specific query. This formulation departs from traditional generalization-focused learning theoretic approaches by targeting accurate prediction for a specific query instance. We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee. We further enhance our method by (i) incorporating the kernel trick to operate in high-dimensional feature spaces without explicit mappings, and (ii) introducing an optimal design-based regularizer to encourage diversity in the selected examples. Empirically, we demonstrate significant improvements over standard retrieval methods across a suite of classification tasks, highlighting the benefits of structure-aware, diverse example selection for ICL in real-world, label-scarce scenarios.
Related papers
- A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't) [14.070675074621043]
Instruction fine-tuning involves selecting a subset of instruction training data from a large candidate pool, using a small query set from the target task.<n>Despite growing interest, the literature on targeted instruction selection remains fragmented and opaque.<n>In this work, we aim to bring clarity to this landscape by disentangling and systematically analyzing the two core ingredients: data representation and selection algorithms.
arXiv Detail & Related papers (2026-02-16T12:33:05Z) - Nearly Optimal Active Preference Learning and Its Application to LLM Alignment [68.56793807995417]
Aligning large language models depends on high-quality datasets of human preference labels.<n>Many existing approaches adopt classical experimental design criteria such as G- or D-optimality.<n>In this work, we identify a simple intuition specific to preference learning that calls into question the suitability of these existing design objectives.
arXiv Detail & Related papers (2026-02-02T03:21:29Z) - Order Matters: Rethinking Prompt Construction in In-Context Learning [52.19217980839306]
In-context learning (ICL) enables large language models to perform new tasks by conditioning on a sequence of examples.<n>Most prior work assumes that which examples are chosen has a far greater effect on performance than how those examples are ordered.<n>We revisit this assumption and conduct a systematic comparison between the effect of selection and ordering.
arXiv Detail & Related papers (2025-11-12T19:57:55Z) - Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers [22.72866404096086]
Amortized learning is the idea of reusing computation or inductive biases shared across tasks to enable rapid generalization to novel problems.<n>Current approaches struggle to scale to large datasets because their capacity to process task data at inference is often limited.<n>We propose iterative amortized inference, a class of models that refine solutions step-by-step over mini-batches.
arXiv Detail & Related papers (2025-10-13T14:40:47Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Experimental Design for Active Transductive Inference in Large Language Models [18.2671641610825]
We use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD)
We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set.
We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.
arXiv Detail & Related papers (2024-04-12T23:27:46Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages.
First we filter the dataset to obtain informative in-context examples individually.
Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z) - Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability.
We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples.
We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.