Can Agents Run Relay Race with Strangers? Generalization of RL to
Out-of-Distribution Trajectories
- URL: http://arxiv.org/abs/2304.13424v1
- Date: Wed, 26 Apr 2023 10:12:12 GMT
- Title: Can Agents Run Relay Race with Strangers? Generalization of RL to
Out-of-Distribution Trajectories
- Authors: Li-Cheng Lan, Huan Zhang, Cho-Jui Hsieh
- Abstract summary: We show the prevalence of emphgeneralization failure on controllable states from stranger agents.
We propose a novel method called Self-Trajectory Augmentation (STA), which will reset the environment to the agent's old states according to the Q function during training.
- Score: 88.08381083207449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we define, evaluate, and improve the ``relay-generalization''
performance of reinforcement learning (RL) agents on the out-of-distribution
``controllable'' states. Ideally, an RL agent that generally masters a task
should reach its goal starting from any controllable state of the environment
instead of memorizing a small set of trajectories. For example, a self-driving
system should be able to take over the control from humans in the middle of
driving and must continue to drive the car safely. To practically evaluate this
type of generalization, we start the test agent from the middle of other
independently well-trained \emph{stranger} agents' trajectories. With extensive
experimental evaluation, we show the prevalence of \emph{generalization
failure} on controllable states from stranger agents. For example, in the
Humanoid environment, we observed that a well-trained Proximal Policy
Optimization (PPO) agent, with only 3.9\% failure rate during regular testing,
failed on 81.6\% of the states generated by well-trained stranger PPO agents.
To improve "relay generalization," we propose a novel method called
Self-Trajectory Augmentation (STA), which will reset the environment to the
agent's old states according to the Q function during training. After applying
STA to the Soft Actor Critic's (SAC) training procedure, we reduced the failure
rate of SAC under relay-evaluation by more than three times in most settings
without impacting agent performance and increasing the needed number of
environment interactions. Our code is available at
https://github.com/lan-lc/STA.
Related papers
- Human-compatible driving partners through data-regularized self-play reinforcement learning [3.9682126792844583]
Human-Regularized PPO (HR-PPO) is a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy.
Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%.
arXiv Detail & Related papers (2024-03-28T17:56:56Z) - ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy [47.42940885853956]
A$3$T is a framework that enables the Autonomous.
of Agent Trajectories in the style of ReAct.
In AlfWorld, the agent trained with A$3$T obtains a 1-shot success rate of 96%, and 100% success with 4 iterative rounds.
arXiv Detail & Related papers (2024-03-21T17:43:44Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - Automatic Data Augmentation for Generalization in Deep Reinforcement
Learning [39.477038093585726]
Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios.
Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents.
We show that our agent learns policies and representations that are more robust to changes in the environment that do not affect the agent.
arXiv Detail & Related papers (2020-06-23T09:50:22Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.