Related papers: Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation

Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation

URL: http://arxiv.org/abs/2407.05693v2
Date: Fri, 13 Sep 2024 06:57:01 GMT
Title: Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation
Authors: Jian Qian, Miao Sun, Sifan Zhou, Ziyu Zhao, Ruizhi Hun, Patrick Chiang,
Abstract summary: We propose Sub-SA (Submodular Selective ), a sub-module-based selective annotation method. The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples. We also propose RPR (Reward and Penalty Regularization) to better balance the diversity and representativeness of the unlabeled dataset.
Score: 4.846839863393725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-context learning (ICL) leverages in-context examples as prompts for the predictions of Large Language Models (LLMs). These prompts play a crucial role in achieving strong performance. However, the selection of suitable prompts from a large pool of labeled examples often entails significant annotation costs. To address this challenge, we propose Sub-SA (Submodular Selective Annotation), a submodule-based selective annotation method. The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples and minimizing the time consumption of the selection process. In Sub-SA, we design a submodular function that facilitates effective subset selection for annotation and demonstrates the characteristics of monotonically and submodularity from the theoretical perspective. Specifically, we propose RPR (Reward and Penalty Regularization) to better balance the diversity and representativeness of the unlabeled dataset attributed to a reward term and a penalty term, respectively. Consequently, the selection for annotations can be effectively addressed with a simple yet effective greedy search algorithm based on the submodular function. Finally, we apply the similarity prompt retrieval to get the examples for ICL.

Related papers

Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation [77.07879255360342]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating retrieved information.<n>In RAG, the emphasis has shifted to utility, which considers the usefulness of passages for generating accurate answers.<n>Our approach focuses on utility-based selection rather than ranking, enabling dynamic passage selection tailored to specific queries without the need for fixed thresholds.<n>Our experiments demonstrate that utility-based selection provides a flexible and cost-effective solution for RAG, significantly reducing computational costs while improving answer quality.
arXiv Detail & Related papers (2025-07-25T09:32:29Z)
The Power of Adaptation: Boosting In-Context Learning through Adaptive Prompting [8.260097638532878]
Large Language Models (LLMs) have demonstrated exceptional abilities across a broad range of language-related tasks. We propose textscAdaptive-Prompt, a novel method that adaptively selects exemplars by leveraging model feedback. Experimental results show that textscAdaptive-Prompt significantly enhances LLM performance across a variety of reasoning tasks.
arXiv Detail & Related papers (2024-12-23T15:49:43Z)
PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks [57.86928556668849]
Large Language Models (LLMs) have recently demonstrated impressive few-shot learning capabilities through in-context learning (ICL) ICL performance is highly dependent on the choice of few-shot demonstrations, making the selection of the most optimal examples a persistent research challenge. In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages.
arXiv Detail & Related papers (2024-12-07T17:51:31Z)
Instruction Tuning with Retrieval-based Examples Ranking for Aspect-based Sentiment Analysis [7.458853474864602]
Aspect-based sentiment analysis (ABSA) identifies sentiment information related to specific aspects and provides deeper market insights to businesses and organizations. Recent studies have proposed using fixed examples for instruction tuning to reformulate ABSA as a generation task. This study proposes an instruction learning method with retrieval-based example ranking for ABSA tasks.
arXiv Detail & Related papers (2024-05-28T10:39:10Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models [66.32043210237768]
This paper introduces an influence-driven selective annotation method. It aims to minimize annotation costs while improving the quality of in-context examples. Experiments confirm the superiority of the proposed method on various benchmarks.
arXiv Detail & Related papers (2023-10-16T22:53:54Z)
Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation. We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z)
Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training [16.740101757982828]
We propose a pragmatic method that reduces the annotation cost for structured label spaces using active learning. Our approach leverages partial annotation, which reduces labeling costs by selecting only the most informative sub-structures for annotation. We also utilize self-training to incorporate the current model's automatic predictions as pseudo-labels for un-annotated sub-structures.
arXiv Detail & Related papers (2023-05-22T01:58:42Z)
Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages. First we filter the dataset to obtain informative in-context examples individually. Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z)
Learning Debiased and Disentangled Representations for Semantic Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation. By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes. Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z)
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English. We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.