Related papers: Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation

Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation

URL: http://arxiv.org/abs/2510.09908v2
Date: Wed, 15 Oct 2025 17:38:30 GMT
Title: Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation
Authors: Hao Yan, Heyan Zhang, Yongyi Guo,
Abstract summary: We propose PULSE-UCB, an algorithm that leverages pretrained models trained on auxiliary data to impute missing features during online decision-making.<n>Our results quantify how uncertainty in predicted contexts affects decision quality and how much historical data is needed to improve downstream learning.
Score: 4.956682471555875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rise of large-scale pretrained models has made it feasible to generate predictive or synthetic features at low cost, raising the question of how to incorporate such surrogate predictions into downstream decision-making. We study this problem in the setting of online linear contextual bandits, where contexts may be complex, nonstationary, and only partially observed. In addition to bandit data, we assume access to an auxiliary dataset containing fully observed contexts--common in practice since such data are collected without adaptive interventions. We propose PULSE-UCB, an algorithm that leverages pretrained models trained on the auxiliary data to impute missing features during online decision-making. We establish regret guarantees that decompose into a standard bandit term plus an additional component reflecting pretrained model quality. In the i.i.d. context case with H\"older-smooth missing features, PULSE-UCB achieves near-optimal performance, supported by matching lower bounds. Our results quantify how uncertainty in predicted contexts affects decision quality and how much historical data is needed to improve downstream learning.

Related papers

Optimal training-conditional regret for online conformal prediction [20.643619398558315]
We study online conformal prediction for non-stationary data streams subject to unknown distribution drift.<n>We specifically focus on independently generated data with two types of distribution shift: abrupt change points and smooth drift.<n>We establish non-asymptotic regret guarantees for our online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets.
arXiv Detail & Related papers (2026-02-18T15:31:15Z)
Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks [0.7106986689736825]
We present textitOnline Bayesian Imbalanced Learning (OBIL), a principled framework that decouples likelihood-ratio estimation from class-prior assumptions.<n>Our approach builds on the established connection between Bregman divergences and proper scoring rules to show that deep networks trained with such losses produce posterior probability estimates.<n>We prove that these likelihood-ratio estimates remain valid under arbitrary changes in class priors and cost structures, requiring only a threshold adjustment for optimal Bayes decisions.
arXiv Detail & Related papers (2026-02-08T21:23:00Z)
SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Data Curation Matters: Model Collapse and Spurious Shift Performance Prediction from Training on Uncurated Text Embeddings [0.0]
Training models on uncurated Text Embeddings (TEs) can lead to a severe failure mode known as model collapse.<n>We introduce a set of metrics that capture the extent of model collapse, offering a new perspective on TE quality as a proxy for data curation.<n>These findings highlight the need for more nuanced curation and evaluation of embedding-based representations.
arXiv Detail & Related papers (2025-06-22T11:01:41Z)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback. Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data. We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models [39.42802115580677]
Semi-supervised learning (SSL) can leverage both labeled and unlabeled data to build a predictive model. Recent literature suggests that naively applying state-of-the-art SSL with a pretrained model fails to unleash the full potential of training data. We propose to use pseudo-labels from the unlabelled data to update the feature extractor that is less sensitive to incorrect labels.
arXiv Detail & Related papers (2023-09-09T01:57:14Z)
Improved Visual Fine-tuning with Natural Language Supervision [36.250244364023665]
Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data. The problem of catastrophic forgetting in pre-trained backbone has been extensively studied for fine-tuning. We introduce a reference distribution obtained from a fixed text classifier, which can help regularize the learned vision classifier.
arXiv Detail & Related papers (2023-04-04T03:08:02Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora. It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons. We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z)
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes. A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.