Related papers: Bandit Guided Submodular Curriculum for Adaptive Subset Selection

Bandit Guided Submodular Curriculum for Adaptive Subset Selection

URL: http://arxiv.org/abs/2511.22944v1
Date: Fri, 28 Nov 2025 07:31:53 GMT
Title: Bandit Guided Submodular Curriculum for Adaptive Subset Selection
Authors: Prateek Chanda, Prayas Agrawal, Saral Sureka, Lokesh Reddy Polu, Atharv Kshirsagar, Ganesh Ramakrishnan,
Abstract summary: Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive.<n>We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection.<n>We introduce ONLINESUBMOD, a novel online greedy policy that optimize a utility-driven reward and provably achieves no-regret performance under various sampling regimes.
Score: 12.516248058768264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.

Related papers

Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning [51.99383151474742]
We propose a robust uncertainty modeling framework for active CLIP adaptation based on dual-prompt tuning.<n>We show that our method consistently outperforms existing active learning methods under the same annotation budget.
arXiv Detail & Related papers (2026-02-04T09:01:55Z)
AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning [35.68750432673712]
Existing multi-prompt learning methods primarily focus on utilizing various meticulously designed prompts within a single foundation vision-language model.<n>The same prompt can convey different semantics across distinct vision-language models, resulting in inconsistent predictions of identical prompt.<n>We propose Adaptive-Debiased Ensemble MultiPrompt Learning, abbreviated as AmPLe, to mitigate the two types of bias simultaneously.
arXiv Detail & Related papers (2025-12-20T16:21:24Z)
Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum Learning [64.92967672226534]
This paper presents a Competence-Aware Multi-Perspective cUrriculum inStruction tuning framework termed CAMPUS.<n> CAMPUS offers several advantages: Dynamic selection for sub-curriculum, competency-aware adjustment to the curriculum schedule, and multiple difficulty-based scheduling.
arXiv Detail & Related papers (2025-09-17T07:58:59Z)
Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding [53.63482987410292]
We present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models.<n>We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.
arXiv Detail & Related papers (2025-07-13T19:36:17Z)
The Power of Adaptation: Boosting In-Context Learning through Adaptive Prompting [8.260097638532878]
Large Language Models (LLMs) have demonstrated exceptional abilities across a broad range of language-related tasks.<n>We propose textscAdaptive-Prompt, a novel method that adaptively selects exemplars by leveraging model feedback.<n> Experimental results show that textscAdaptive-Prompt significantly enhances LLM performance across a variety of reasoning tasks.
arXiv Detail & Related papers (2024-12-23T15:49:43Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation [4.846839863393725]
We propose Sub-SA (Submodular Selective ), a sub-module-based selective annotation method. The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples. We also propose RPR (Reward and Penalty Regularization) to better balance the diversity and representativeness of the unlabeled dataset.
arXiv Detail & Related papers (2024-07-08T07:47:30Z)
Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner. We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z)
Self-regulating Prompts: Foundational Model Adaptation without Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC. PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z)
Progressive Multi-Stage Learning for Discriminative Tracking [25.94944743206374]
We propose a joint discriminative learning scheme with the progressive multi-stage optimization policy of sample selection for robust visual tracking. The proposed scheme presents a novel time-weighted and detection-guided self-paced learning strategy for easy-to-hard sample selection. Experiments on the benchmark datasets demonstrate the effectiveness of the proposed learning framework.
arXiv Detail & Related papers (2020-04-01T07:01:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.