In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks
- URL: http://arxiv.org/abs/2510.00347v1
- Date: Tue, 30 Sep 2025 23:17:18 GMT
- Title: In-Context Curiosity: Distilling Exploration for Decision-Pretrained Transformers on Bandit Tasks
- Authors: Huitao Yang, Guanting Chen,
- Abstract summary: Prediction-Powered Transformer (PPT) is a lightweight, exploration-inspired regularizer for offline pretraining.<n>PPT augments DPT with an auxiliary reward predictor, using prediction error as an intrinsic curiosity signal to encourage broader exploration during training.<n>Preliminary results suggest that curiosity-driven pretraining offers a promising direction for enhancing out-of-distribution generalization in in-context RL agents.
- Score: 1.18482330187379
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) continue to grow in capability, there is increasing interest in incorporating them into decision-making tasks. A common pipeline for this is Decision-Pretrained Transformers (DPTs). However, existing training methods for DPTs often struggle to generalize beyond their pretraining data distribution. To explore mitigation of this limitation, we propose in-context curiosity -- a lightweight, exploration-inspired regularizer for offline pretraining -- and introduce the Prediction-Powered Transformer (PPT) framework. PPT augments DPT with an auxiliary reward predictor, using prediction error as an intrinsic curiosity signal to encourage broader exploration during training. In proof-of-concept experiments on Gaussian multi-armed bandits, PPT shows improved robustness: it moderates the performance degradation observed in DPT when test environments exhibit higher variance in reward, particularly when pretraining data has limited diversity. While the quality of offline data remain fundamental, our preliminary results suggest that curiosity-driven pretraining offers a promising direction for enhancing out-of-distribution generalization in in-context RL agents.
Related papers
- From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers [67.02076505996284]
We study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other.<n>Our results shed light on the algorithmic biases of pretrained transformers and offer conceptual guidelines for data-driven control of their learned behaviors.
arXiv Detail & Related papers (2025-12-21T08:10:26Z) - PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting [90.47748423913369]
State-of-the-art motion forecasting models rely on large curated datasets with manually annotated or heavily post-processed trajectories.<n>PWT is a simple and scalable alternative that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking.<n>It achieves strong performance across standard benchmarks particularly in low-data regimes, and in cross-domain, end-to-end and multi-class settings.
arXiv Detail & Related papers (2024-12-09T13:48:15Z) - BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples [28.947545367473086]
We propose a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models.
Our experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs.
arXiv Detail & Related papers (2024-03-16T04:23:46Z) - Diverse Data Augmentation with Diffusions for Effective Test-time Prompt
Tuning [73.75282761503581]
We propose DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data.
Our experiments on test datasets with distribution shifts and unseen categories demonstrate that DiffTPT improves the zero-shot accuracy by an average of 5.13%.
arXiv Detail & Related papers (2023-08-11T09:36:31Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - The Emergence of Essential Sparsity in Large Pre-trained Models: The
Weights that Matter [113.35761858962522]
This paper studies induced sparse patterns across multiple large pre-trained vision and language transformers.
We propose the existence of essential sparsity defined with a sharp dropping point beyond which the performance declines much faster.
We also find essential sparsity to hold valid for N:M sparsity patterns as well as on modern-scale large language models.
arXiv Detail & Related papers (2023-06-06T15:49:09Z) - Future-conditioned Unsupervised Pretraining for Decision Transformer [19.880628629512504]
We propose Pretrained Decision Transformer (PDT) as a conceptually simple approach for unsupervised RL pretraining.
PDT leverages future trajectory information as a privileged context to predict actions during training.
It can extract diverse behaviors from offline data and controllably sample high-return behaviors by online finetuning.
arXiv Detail & Related papers (2023-05-26T07:05:08Z) - Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable
Transformers [107.3726071306935]
We propose a new plug-and-play training framework, SMoE-Dropout, to enable scaling transformers to better accuracy in their full capacity without collapse.
SMoE-Dropout consists of a randomly and fixed router network to activate experts and gradually increases the activated expert number as training progresses over time.
Our experiments demonstrate the superior performance and substantial computation savings of SMoE-Dropout, compared to dense training baselines with equivalent parameter counts.
arXiv Detail & Related papers (2023-03-02T22:12:51Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.