Backward Curriculum Reinforcement Learning
- URL: http://arxiv.org/abs/2212.14214v4
- Date: Mon, 4 Sep 2023 22:48:48 GMT
- Title: Backward Curriculum Reinforcement Learning
- Authors: KyungMin Ko
- Abstract summary: Current reinforcement learning algorithms train an agent using forward-generated trajectories.
While realizing the value of reinforcement learning results from sufficient exploration, this approach leads to a trade-off in losing sample efficiency.
We propose novel backward curriculum reinforcement learning that begins training the agent using the backward trajectory of the episode instead of the original forward trajectory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current reinforcement learning algorithms train an agent using
forward-generated trajectories, which provide little guidance so that the agent
can explore as much as possible. While realizing the value of reinforcement
learning results from sufficient exploration, this approach leads to a
trade-off in losing sample efficiency, an essential factor impacting algorithm
performance. Previous tasks use reward-shaping techniques and network structure
modification to increase sample efficiency. However, these methods require many
steps to implement. In this work, we propose novel backward curriculum
reinforcement learning that begins training the agent using the backward
trajectory of the episode instead of the original forward trajectory. This
approach provides the agent with a strong reward signal, enabling more
sample-efficient learning. Moreover, our method only requires a minor change in
the algorithm of reversing the order of the trajectory before agent training,
allowing a straightforward application to any state-of-the-art algorithm.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Inverse Reinforcement Learning from Non-Stationary Learning Agents [11.203097744443898]
We study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy.
We propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function.
arXiv Detail & Related papers (2024-10-18T03:02:44Z) - Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling [10.931466852026663]
We investigate the optimal use of trained deep reinforcement learning (DRL) agents during inference.
Our work is based on the hypothesis that, similar to search algorithms, the utilization of trained DRL agents should be dependent on the acceptable computational budget.
We propose an algorithm for obtaining the optimal parameterization for such a given number of solutions and any given trained agent.
arXiv Detail & Related papers (2024-06-11T14:59:18Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - A State Augmentation based approach to Reinforcement Learning from Human
Preferences [20.13307800821161]
Preference Based Reinforcement Learning attempts to solve the issue by utilizing binary feedbacks on queried trajectory pairs.
We present a state augmentation technique that allows the agent's reward model to be robust.
arXiv Detail & Related papers (2023-02-17T07:10:50Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.