Randomized Policy Learning for Continuous State and Action MDPs
- URL: http://arxiv.org/abs/2006.04331v2
- Date: Mon, 16 Nov 2020 01:55:12 GMT
- Title: Randomized Policy Learning for Continuous State and Action MDPs
- Authors: Hiteshi Sharma and Rahul Jain
- Abstract summary: We present textttRANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces.
We show the numerical performance on challenging environments and compare them with deep neural network based algorithms.
- Score: 8.109579454896128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning methods have achieved state-of-the-art results in
a variety of challenging, high-dimensional domains ranging from video games to
locomotion. The key to success has been the use of deep neural networks used to
approximate the policy and value function. Yet, substantial tuning of weights
is required for good results. We instead use randomized function approximation.
Such networks are not only cheaper than training fully connected networks but
also improve the numerical performance. We present \texttt{RANDPOL}, a
generalized policy iteration algorithm for MDPs with continuous state and
action spaces. Both the policy and value functions are represented with
randomized networks. We also give finite time guarantees on the performance of
the algorithm. Then we show the numerical performance on challenging
environments and compare them with deep neural network based algorithms.
Related papers
- Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies [7.9898826915621965]
We use a verification procedure that trains another neural network, which acts as a certificate proving that the policy satisfies the task.
For reach-avoid tasks, it suffices to show that this certificate network is a reach-avoid supermartingale (RASM)
arXiv Detail & Related papers (2024-06-02T18:19:19Z) - Thompson sampling for improved exploration in GFlowNets [75.89693358516944]
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy.
We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
arXiv Detail & Related papers (2023-06-30T14:19:44Z) - Regularization of polynomial networks for image recognition [78.4786845859205]
Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability.
We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks.
arXiv Detail & Related papers (2023-03-24T10:05:22Z) - Efficiently Learning Small Policies for Locomotion and Manipulation [12.340412143459869]
We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning.
We show that our method can be appended to any off-policy reinforcement learning algorithm.
We provide a method to select the best architecture, given a constraint on the number of parameters.
arXiv Detail & Related papers (2022-09-30T23:49:00Z) - Learning Optimal Antenna Tilt Control Policies: A Contextual Linear
Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity.
We devise algorithms learning optimal tilt control policies from existing data.
We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Direct Random Search for Fine Tuning of Deep Reinforcement Learning
Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts.
Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - GRAC: Self-Guided and Self-Regularized Actor-Critic [24.268453994605512]
We propose a self-regularized TD-learning method to address divergence without requiring a target network.
We also propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization.
This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network.
We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
arXiv Detail & Related papers (2020-09-18T17:58:29Z) - Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks.
The algorithm consistently generates control policies that outperform state-of-arts in literature.
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.