Episodic Policy Gradient Training
- URL: http://arxiv.org/abs/2112.01853v1
- Date: Fri, 3 Dec 2021 11:15:32 GMT
- Title: Episodic Policy Gradient Training
- Authors: Hung Le, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen,
Svetha Venkatesh
- Abstract summary: Episodic Policy Gradient Training (EPGT)
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly.
Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.
- Score: 43.62408764384791
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel training procedure for policy gradient methods wherein
episodic memory is used to optimize the hyperparameters of reinforcement
learning algorithms on-the-fly. Unlike other hyperparameter searches, we
formulate hyperparameter scheduling as a standard Markov Decision Process and
use episodic memory to store the outcome of used hyperparameters and their
training contexts. At any policy update step, the policy learner refers to the
stored experiences, and adaptively reconfigures its learning algorithm with the
new hyperparameters determined by the memory. This mechanism, dubbed as
Episodic Policy Gradient Training (EPGT), enables an episodic learning process,
and jointly learns the policy and the learning algorithm's hyperparameters
within a single run. Experimental results on both continuous and discrete
environments demonstrate the advantage of using the proposed method in boosting
the performance of various policy gradient algorithms.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods [0.40964539027092917]
Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems.
In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming.
This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time.
arXiv Detail & Related papers (2023-10-04T09:21:01Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - Non-Parametric Stochastic Policy Gradient with Strategic Retreat for
Non-Stationary Environment [1.5229257192293197]
We propose a systematic methodology to learn a sequence of optimal control policies non-parametrically.
Our methodology has outperformed the well-established DDPG and TD3 methodology by a sizeable margin in terms of learning performance.
arXiv Detail & Related papers (2022-03-24T21:41:13Z) - Online hyperparameter optimization by real-time recurrent learning [57.01871583756586]
Our framework takes advantage of the analogy between hyperparameter optimization and parameter learning in neural networks (RNNs)
It adapts a well-studied family of online learning algorithms for RNNs to tune hyperparameters and network parameters simultaneously.
This procedure yields systematically better generalization performance compared to standard methods, at a fraction of wallclock time.
arXiv Detail & Related papers (2021-02-15T19:36:18Z) - Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary
Strategies [41.13416324282365]
We propose a framework which entails the application of Evolutionary Strategies to online hyper- parameter tuning in off-policy learning.
Our formulation draws close connections to meta-gradients and leverages the strengths of black-box optimization with relatively low-dimensional search spaces.
arXiv Detail & Related papers (2020-06-13T03:54:26Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Kalman meets Bellman: Improving Policy Evaluation through Value Tracking [59.691919635037216]
Policy evaluation is a key process in Reinforcement Learning (RL)
We devise an optimization method, called Kalman Optimization for Value Approximation (KOVA)
KOVA minimizes a regularized objective function that concerns both parameter and noisy return uncertainties.
arXiv Detail & Related papers (2020-02-17T13:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.