Related papers: Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning

Related papers

ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters [6.13905106667213]
ATLAS is a task-distributed framework that iteratively develops a lightweight research agent.<n>Our core algorithm, Evolving Direct Preference Optimization (EvoDPO), adaptively updates the phase-indexed reference policy.<n>Results show that ATLAS improves stability and performance over a static single-agent baseline.
arXiv Detail & Related papers (2026-02-02T19:23:33Z)
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning [24.80806018678682]
Reinforcement learning (RL) offers a principled way to enhance the reasoning capabilities of large language models.<n>In practice, RL progress often slows when task difficulty becomes poorly aligned with model capability.<n>We propose a framework that sustains effective learning signals through adaptive environment design.
arXiv Detail & Related papers (2026-01-08T10:42:04Z)
In-Context Reinforcement Learning through Bayesian Fusion of Context and Value Prior [53.21550098214227]
In-context reinforcement learning promises fast adaptation to unseen environments without parameter updates.<n>We introduce SPICE, a Bayesian ICRL method that learns a prior over Q-values via deep ensemble and updates this prior at test-time.<n>We prove that SPICE achieves regret-optimal behaviour in both bandits and finite-horizon MDPs, even when pretrained only on suboptimal trajectories.
arXiv Detail & Related papers (2026-01-06T13:41:31Z)
Trust-Region Adaptive Policy Optimization [82.09255251747818]
Post-training methods play an important role in improving large language models' (LLMs) complex reasoning abilities.<n>We introduce TRAPO, a framework that interleavesSupervised Fine-Tuning (SFT) and Reinforcement Learning (RL) within each training instance.<n>Experiments on five mathematical reasoning benchmarks show that TRAPO consistently surpasses standard SFT, RL, and SFT-then-RL pipelines.
arXiv Detail & Related papers (2025-12-19T14:37:07Z)
Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning [69.81148368677593]
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting.<n>Previous work has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters.<n>We propose Dynamic Mixture of Progressive Efficient Expert Library (DMPEL) for lifelong robot learning.<n>Our framework outperforms state-of-the-art lifelong learning methods in success rates across continual adaptation, while utilizing minimal trainable parameters and storage.
arXiv Detail & Related papers (2025-06-06T11:13:04Z)
Optimization-Inspired Few-Shot Adaptation for Large Language Models [25.439708260502556]
Large Language Models (LLMs) have demonstrated remarkable performance in real-world applications.<n>Adapting LLMs to novel tasks via fine-tuning often requires substantial training data and computational resources that are impractical in few-shot scenarios.<n>Existing approaches, such as in-context learning and.<n>Efficient Fine-Tuning (PEFT), face key limitations.
arXiv Detail & Related papers (2025-05-25T11:54:23Z)
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs [13.292104357930866]
SASR is a step-wise adaptive hybrid training framework for large language models.<n>It unifies SFT and RL and dynamically balances the two throughout optimization.<n> Experimental results demonstrate that SASR outperforms SFT, RL, and static hybrid training methods.
arXiv Detail & Related papers (2025-05-19T12:10:17Z)
Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization [55.14484317645865]
We develop a conditional diffusion model to produce exceptional quality prompts for offline reinforcement learning tasks. We show that the Prompt diffuser is a robust and effective tool for the prompt-tuning process, demonstrating strong performance in the meta-RL tasks.
arXiv Detail & Related papers (2024-11-02T07:38:02Z)
Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion [3.2635082758250693]
A continual learning agent builds on previous experiences to develop increasingly complex behaviors. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance, known as the stability-plasticity dilemma, is especially pronounced in complex multi-agent domains such as the train scheduling problem.
arXiv Detail & Related papers (2024-08-19T09:33:31Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties. Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z)
Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning [18.579378919155864]
We propose Adaptive $Q$Network (AdaQN) to take into account the non-stationarity of the optimization procedure without requiring additional samples. AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600 games.
arXiv Detail & Related papers (2024-05-25T11:57:43Z)
Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs) Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs. Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z)
Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction. Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z)
Self-Supervised Primal-Dual Learning for Constrained Optimization [19.965556179096385]
This paper studies how to train machine-learning models that directly approximate the optimal solutions of constrained optimization problems. It proposes the idea of Primal-Dual Learning (PDL), a self-supervised training method that does not require a set of pre-solved instances or an optimization solver for training and inference.
arXiv Detail & Related papers (2022-08-18T20:07:10Z)
Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization [1.7970523486905976]
This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization. It derives a new optimization method using forward KL divergence. In a realistic robotic simulation, the proposed method with the moderate optimism outperformed one of the state-of-the-art RL method.
arXiv Detail & Related papers (2021-05-27T08:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.