Related papers: CHIRPs: Change-Induced Regret Proxy metrics for Lifelong Reinforcement Learning

Related papers

Reward-Conditioned Reinforcement Learning [56.417273471201845]
We introduce Reward-Conditioned Reinforcement Learning (RCRL), a framework that trains a single agent to optimize a family of reward specifications.<n>RCRL conditions the agent on reward parameterizations and learns multiple reward objectives from a shared replay data entirely off-policy.<n>Our results demonstrate that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.
arXiv Detail & Related papers (2026-03-05T11:29:17Z)
Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents [54.18201810286764]
Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering.<n>In long trajectories, agents often trigger excessive and low-quality tool calls, increasing latency and degrading inference performance.<n>We propose using entropy reduction as a supervisory signal and design two reward strategies to address the differing needs of optimizing tool-use behavior.
arXiv Detail & Related papers (2026-02-02T12:52:14Z)
Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates [53.3717573880076]
We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates.<n>JitRL maintains a dynamic, non-parametric memory of experiences and retrieves relevant trajectories to estimate action advantages on-the-fly.<n>Experiments on WebArena and Jericho demonstrate that JitRL establishes a new state-of-the-art among training-free methods.
arXiv Detail & Related papers (2026-01-26T14:16:51Z)
AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI [5.165179548592513]
AgentChangeBench is a benchmark designed to measure how tool augmented language model agents adapt to mid dialogue goal shifts.<n>Our framework formalizes evaluation through four complementary metrics: Task Success Rate (TSR) for effectiveness, Tool Use Efficiency (TUE) for reliability, Tool Call Redundancy Rate (TCRR) for wasted effort, and GoalShift Recovery Time (GSRT) for adaptation.
arXiv Detail & Related papers (2025-10-20T23:48:07Z)
Repairing Reward Functions with Human Feedback to Mitigate Reward Hacking [13.417125511014447]
We propose an automated framework that repairs a human-specified proxy reward function by learning an additive, transition-dependent correction term from preferences.<n>PBRR consistently outperforms baselines that learn a reward function from scratch from preferences or modify the proxy reward function using other approaches.
arXiv Detail & Related papers (2025-10-14T23:18:24Z)
Reinforcement Learning for Machine Learning Engineering Agents [52.03168614623642]
We show that agents backed by weaker models that improve via reinforcement learning can outperform agents backed by much larger, but static models.<n>We propose duration- aware gradient updates in a distributed asynchronous RL framework to amplify high-cost but high-reward actions.<n>We also propose environment instrumentation to offer partial credit, distinguishing almost-correct programs from those that fail early.
arXiv Detail & Related papers (2025-09-01T18:04:10Z)
Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments [1.2787026473187368]
Multi-agent reinforcement learning (MARL) has achieved notable success in environments where agents must learn coordinated behaviors.<n>Traditional knowledge transfer methods in MARL struggle to generalize, and agents often require costly retraining to adapt.<n>This paper introduces a causal knowledge transfer framework that enables RL agents to learn and share compact causal representations of paths within a non-stationary environment.
arXiv Detail & Related papers (2025-07-18T11:59:55Z)
JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning [6.81021875668872]
We propose JoyAgents-R1, which first applies Group Relative Policy Optimization to the joint training of heterogeneous multi-agents.<n>We show that JoyAgents-R1 achieves performance comparable to that of larger LLMs while built on smaller open-source models.
arXiv Detail & Related papers (2025-06-24T17:59:31Z)
RRO: LLM Agent Optimization Through Rising Reward Trajectories [52.579992804584464]
Large language models (LLMs) have exhibited extraordinary performance in a variety of tasks.<n>In practice, agents sensitive to the outcome of certain key steps which makes them likely to fail the task.<n>We propose Reward Rising Optimization (RRO) to mitigate this issue.
arXiv Detail & Related papers (2025-05-27T05:27:54Z)
Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models. Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process. We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z)
$TAR^2$: Temporal-Agent Reward Redistribution for Optimal Policy Preservation in Multi-Agent Reinforcement Learning [7.97295726921338]
Temporal-Agent Reward Redistribution $TAR2$ is a novel approach that decomposes sparse global rewards into agent-specific, time-step-specific components. We show that $TAR2$ aligns with potential-based reward shaping, preserving the same optimal policies as the original environment.
arXiv Detail & Related papers (2025-02-07T12:07:57Z)
No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z)
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>The impact of clumsy or even malicious agents--those who frequently make errors in their tasks--on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents on different downstream tasks.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms [5.331052581441265]
We develop a novel algorithm, Corrected Uniform Experience (CUER), which samples the stored experience while considering the fairness among all other experiences. CUER provides promising improvements for off-policy continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.
arXiv Detail & Related papers (2024-06-13T12:03:40Z)
A State Augmentation based approach to Reinforcement Learning from Human Preferences [20.13307800821161]
Preference Based Reinforcement Learning attempts to solve the issue by utilizing binary feedbacks on queried trajectory pairs. We present a state augmentation technique that allows the agent's reward model to be robust.
arXiv Detail & Related papers (2023-02-17T07:10:50Z)
Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning [42.540853953923495]
We introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination. Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents. We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound.
arXiv Detail & Related papers (2023-02-07T12:30:52Z)
TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems [0.0]
We present TransfQMix, a new approach that uses transformers to leverage a latent graph structure and learn better coordination policies. Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents. We report TransfQMix's performances in the Spread and StarCraft II environments.
arXiv Detail & Related papers (2023-01-13T00:07:08Z)
RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability. RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z)
Decentralized scheduling through an adaptive, trading-based multi-agent system [1.7403133838762448]
In multi-agent reinforcement learning systems, the actions of one agent can have a negative impact on the rewards of other agents. This work applies a trading approach to a simulated scheduling environment, where the agents are responsible for the assignment of incoming jobs to compute cores. The agents can trade the usage right of computational cores to process high-priority, high-reward jobs faster than low-priority, low-reward jobs.
arXiv Detail & Related papers (2022-07-05T13:50:18Z)
Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning [85.86440477005523]
We study more human-like RL agents which incorporate an established model of human-irrationality, the Rational Inattention (RI) model. RIRL models the cost of cognitive information processing using mutual information. We show that using RIRL yields a rich spectrum of new equilibrium behaviors that differ from those found under rational assumptions.
arXiv Detail & Related papers (2022-01-18T20:54:00Z)
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models [85.68751244243823]
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied. We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time. We find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward.
arXiv Detail & Related papers (2022-01-10T18:58:52Z)
Multi-Agent Transfer Learning in Reinforcement Learning-Based Ride-Sharing Systems [3.7311680121118345]
Reinforcement learning (RL) has been used in a range of simulated real-world tasks. In this paper we investigate the impact of TL transfer parameters with fixed source and target roles.
arXiv Detail & Related papers (2021-12-01T11:23:40Z)
Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks [79.13089902898848]
Self-supervised learning is a powerful paradigm for representation learning on unlabelled images. We show that different tasks in computer vision require features to encode different (in)variances.
arXiv Detail & Related papers (2021-11-22T18:16:35Z)
Low-Precision Reinforcement Learning [63.930246183244705]
Low-precision training has become a popular approach to reduce computation time, memory footprint, and energy consumption in supervised learning. In this paper we consider continuous control with the state-of-the-art SAC agent and demonstrate that a na"ive adaptation of low-precision methods from supervised learning fails.
arXiv Detail & Related papers (2021-02-26T16:16:28Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control [0.0]
In many reinforcement learning (RL) problems, it takes some time until a taken action by the agent reaches its maximum effect on the environment. This paper introduces an applicable enhanced Q-learning method in which at the beginning of the learning phase, the agent takes a single action. We have applied the developed method to a structural control problem in which the goal of the agent is to reduce the vibrations of a building subjected to earthquake excitations with a specified delay.
arXiv Detail & Related papers (2020-10-25T19:28:35Z)
Hierarchically Decoupled Imitation for Morphological Transfer [95.19299356298876]
We show that transferring learned information from a morphologically simpler agent can massively improve the sample efficiency of a more complex one. First, we show that incentivizing a complex agent's low-level to imitate a simpler agent's low-level significantly improves zero-shot high-level transfer. Second, we show that KL-regularized training of the high level stabilizes learning and prevents mode-collapse.
arXiv Detail & Related papers (2020-03-03T18:56:49Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.