Bandit Guided Submodular Curriculum for Adaptive Subset Selection
- URL: http://arxiv.org/abs/2511.22944v1
- Date: Fri, 28 Nov 2025 07:31:53 GMT
- Title: Bandit Guided Submodular Curriculum for Adaptive Subset Selection
- Authors: Prateek Chanda, Prayas Agrawal, Saral Sureka, Lokesh Reddy Polu, Atharv Kshirsagar, Ganesh Ramakrishnan,
- Abstract summary: Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive.<n>We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection.<n>We introduce ONLINESUBMOD, a novel online greedy policy that optimize a utility-driven reward and provably achieves no-regret performance under various sampling regimes.
- Score: 12.516248058768264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.
Related papers
- Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning [51.99383151474742]
We propose a robust uncertainty modeling framework for active CLIP adaptation based on dual-prompt tuning.<n>We show that our method consistently outperforms existing active learning methods under the same annotation budget.
arXiv Detail & Related papers (2026-02-04T09:01:55Z) - AmPLe: Supporting Vision-Language Models via Adaptive-Debiased Ensemble Multi-Prompt Learning [35.68750432673712]
Existing multi-prompt learning methods primarily focus on utilizing various meticulously designed prompts within a single foundation vision-language model.<n>The same prompt can convey different semantics across distinct vision-language models, resulting in inconsistent predictions of identical prompt.<n>We propose Adaptive-Debiased Ensemble MultiPrompt Learning, abbreviated as AmPLe, to mitigate the two types of bias simultaneously.
arXiv Detail & Related papers (2025-12-20T16:21:24Z) - Teaching According to Talents! Instruction Tuning LLMs with Competence-Aware Curriculum Learning [64.92967672226534]
This paper presents a Competence-Aware Multi-Perspective cUrriculum inStruction tuning framework termed CAMPUS.<n> CAMPUS offers several advantages: Dynamic selection for sub-curriculum, competency-aware adjustment to the curriculum schedule, and multiple difficulty-based scheduling.
arXiv Detail & Related papers (2025-09-17T07:58:59Z) - Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding [53.63482987410292]
We present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models.<n>We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.
arXiv Detail & Related papers (2025-07-13T19:36:17Z) - The Power of Adaptation: Boosting In-Context Learning through Adaptive Prompting [8.260097638532878]
Large Language Models (LLMs) have demonstrated exceptional abilities across a broad range of language-related tasks.<n>We propose textscAdaptive-Prompt, a novel method that adaptively selects exemplars by leveraging model feedback.<n> Experimental results show that textscAdaptive-Prompt significantly enhances LLM performance across a variety of reasoning tasks.
arXiv Detail & Related papers (2024-12-23T15:49:43Z) - A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z) - Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation [4.846839863393725]
We propose Sub-SA (Submodular Selective ), a sub-module-based selective annotation method.
The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples.
We also propose RPR (Reward and Penalty Regularization) to better balance the diversity and representativeness of the unlabeled dataset.
arXiv Detail & Related papers (2024-07-08T07:47:30Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Progressive Multi-Stage Learning for Discriminative Tracking [25.94944743206374]
We propose a joint discriminative learning scheme with the progressive multi-stage optimization policy of sample selection for robust visual tracking.
The proposed scheme presents a novel time-weighted and detection-guided self-paced learning strategy for easy-to-hard sample selection.
Experiments on the benchmark datasets demonstrate the effectiveness of the proposed learning framework.
arXiv Detail & Related papers (2020-04-01T07:01:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.