Automatic Demonstration Selection for LLM-based Tabular Data Classification
- URL: http://arxiv.org/abs/2506.20451v1
- Date: Wed, 25 Jun 2025 13:57:54 GMT
- Title: Automatic Demonstration Selection for LLM-based Tabular Data Classification
- Authors: Shuchu Han, Wolfgang Bruckner,
- Abstract summary: We present an algorithm to automatically select a reasonable number of required demonstrations.<n>Our method distinguishes itself by integrating the user's selected prompt template and the specific Large Language Model (LLM)<n>We then construct a similarity graph and analyze the eigenvalues of its Laplacian to derive the minimum number of demonstrations capable of representing the data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A fundamental question in applying In-Context Learning (ICL) for tabular data classification is how to determine the ideal number of demonstrations in the prompt. This work addresses this challenge by presenting an algorithm to automatically select a reasonable number of required demonstrations. Our method distinguishes itself by integrating not only the tabular data's distribution but also the user's selected prompt template and the specific Large Language Model (LLM) into its estimation. Rooted in Spectral Graph Theory, our proposed algorithm defines a novel metric to quantify the similarities between different demonstrations. We then construct a similarity graph and analyze the eigenvalues of its Laplacian to derive the minimum number of demonstrations capable of representing the data within the LLM's intrinsic representation space. We validate the efficacy of our approach through experiments comparing its performance against conventional random selection algorithms on diverse datasets and LLMs.
Related papers
- Large Language Models are Demonstration Pre-Selectors for Themselves [57.101804269100185]
In-context learning (ICL) with large language models (LLMs) delivers strong few-shot performance by choosing few-shot demonstrations from the entire training data.<n>FEw yet Essential Demonstration prE-selectoR is a novel pre-selection framework that identifies a representative subset of demonstrations.<n>FEw yet Essential Demonstration prE-selectoR can reduce training data size by over 20% while maintaining performance.
arXiv Detail & Related papers (2025-06-06T12:29:03Z) - Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations [19.25205110583291]
A critical bottleneck is selecting the most relevant data to maximize task-specific performance.<n>Existing data selection approaches include unstable influence-based methods and more stable distribution alignment methods.<n>We introduce a dedicated similarity metric for this space to better identify task-relevant data.
arXiv Detail & Related papers (2025-03-19T11:35:57Z) - On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling on a continuous domain for the data prediction task of (multimodal) self-supervised representation learning.<n>We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.<n>We propose a novel non-parametric method for approximating the sum of conditional probability densities required by MIS.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.<n>We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z) - A Fixed-Point Approach to Unified Prompt-Based Counting [51.20608895374113]
This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for objects indicated by various prompt types, such as box, point, and text.
Our model excels in prominent class-agnostic datasets and exhibits superior performance in cross-dataset adaptation tasks.
arXiv Detail & Related papers (2024-03-15T12:05:44Z) - Self Supervised Correlation-based Permutations for Multi-View Clustering [7.093692674858257]
We propose an end-to-end deep learning-based multi-view clustering framework for general data types.<n>Our approach involves generating meaningful fused representations using a novel permutation-based canonical correlation objective.
arXiv Detail & Related papers (2024-02-26T08:08:30Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - ProcSim: Proxy-based Confidence for Robust Similarity Learning [0.6963971634605796]
We show that popular benchmark datasets often contain numerous wrong labels, and DML methods are susceptible to them.
Intending to study the effect of realistic noise, we create an ontology of the classes in a dataset and use it to simulate semantically coherent labeling mistakes.
To train robust DML models, we propose ProcSim, a framework that assigns a confidence score to each sample using the normalized distance to its class representative.
arXiv Detail & Related papers (2023-11-01T17:17:14Z) - In-Context Demonstration Selection with Cross Entropy Difference [95.21947716378641]
Large language models (LLMs) can use in-context demonstrations to improve performance on zero-shot tasks.
We present a cross-entropy difference (CED) method for selecting in-context demonstrations.
arXiv Detail & Related papers (2023-05-24T05:04:00Z) - Task Affinity with Maximum Bipartite Matching in Few-Shot Learning [28.5184196829547]
We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one.
In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model.
arXiv Detail & Related papers (2021-10-05T23:15:55Z) - Low-rank Dictionary Learning for Unsupervised Feature Selection [11.634317251468968]
We introduce a novel unsupervised feature selection approach by applying dictionary learning ideas in a low-rank representation.
A unified objective function for unsupervised feature selection is proposed in a sparse way by an $ell_2,1$-norm regularization.
Our experimental findings reveal that the proposed method outperforms the state-of-the-art algorithm.
arXiv Detail & Related papers (2021-06-21T13:39:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.