Related papers: Randomized Policy Learning for Continuous State and Action MDPs

Randomized Policy Learning for Continuous State and Action MDPs

URL: http://arxiv.org/abs/2006.04331v2
Date: Mon, 16 Nov 2020 01:55:12 GMT
Title: Randomized Policy Learning for Continuous State and Action MDPs
Authors: Hiteshi Sharma and Rahul Jain
Abstract summary: We present textttRANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. We show the numerical performance on challenging environments and compare them with deep neural network based algorithms.
Score: 8.109579454896128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning methods have achieved state-of-the-art results in a variety of challenging, high-dimensional domains ranging from video games to locomotion. The key to success has been the use of deep neural networks used to approximate the policy and value function. Yet, substantial tuning of weights is required for good results. We instead use randomized function approximation. Such networks are not only cheaper than training fully connected networks but also improve the numerical performance. We present \texttt{RANDPOL}, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. Both the policy and value functions are represented with randomized networks. We also give finite time guarantees on the performance of the algorithm. Then we show the numerical performance on challenging environments and compare them with deep neural network based algorithms.

Related papers

Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies [7.9898826915621965]
We use a verification procedure that trains another neural network, which acts as a certificate proving that the policy satisfies the task. For reach-avoid tasks, it suffices to show that this certificate network is a reach-avoid supermartingale (RASM)
arXiv Detail & Related papers (2024-06-02T18:19:19Z)
Thompson sampling for improved exploration in GFlowNets [75.89693358516944]
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
arXiv Detail & Related papers (2023-06-30T14:19:44Z)
Regularization of polynomial networks for image recognition [78.4786845859205]
Polynomial Networks (PNs) have emerged as an alternative method with a promising performance and improved interpretability. We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks.
arXiv Detail & Related papers (2023-03-24T10:05:22Z)
Efficiently Learning Small Policies for Locomotion and Manipulation [12.340412143459869]
We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning. We show that our method can be appended to any off-policy reinforcement learning algorithm. We provide a method to select the best architecture, given a constraint on the number of parameters.
arXiv Detail & Related papers (2022-09-30T23:49:00Z)
Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. We devise algorithms learning optimal tilt control policies from existing data. We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z)
Direct Random Search for Fine Tuning of Deep Reinforcement Learning Policies [5.543220407902113]
We show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts. Our results show that this method yields more consistent and higher performing agents on the environments we tested.
arXiv Detail & Related papers (2021-09-12T20:12:46Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
GRAC: Self-Guided and Self-Regularized Actor-Critic [24.268453994605512]
We propose a self-regularized TD-learning method to address divergence without requiring a target network. We also propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
arXiv Detail & Related papers (2020-09-18T17:58:29Z)
Queueing Network Controls via Deep Reinforcement Learning [0.0]
We develop a Proximal policy optimization algorithm for queueing networks. The algorithm consistently generates control policies that outperform state-of-arts in literature. A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function.
arXiv Detail & Related papers (2020-07-31T01:02:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.