Robust Value Iteration for Continuous Control Tasks
- URL: http://arxiv.org/abs/2105.12189v1
- Date: Tue, 25 May 2021 19:48:35 GMT
- Title: Robust Value Iteration for Continuous Control Tasks
- Authors: Michael Lutter and Shie Mannor and Jan Peters and Dieter Fox and
Animesh Garg
- Abstract summary: When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
- Score: 99.00362538261972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When transferring a control policy from simulation to a physical system, the
policy needs to be robust to variations in the dynamics to perform well.
Commonly, the optimal policy overfits to the approximate model and the
corresponding state-distribution, often resulting in failure to trasnfer
underlying distributional shifts. In this paper, we present Robust Fitted Value
Iteration, which uses dynamic programming to compute the optimal value function
on the compact state domain and incorporates adversarial perturbations of the
system dynamics. The adversarial perturbations encourage a optimal policy that
is robust to changes in the dynamics. Utilizing the continuous-time perspective
of reinforcement learning, we derive the optimal perturbations for the states,
actions, observations and model parameters in closed-form. Notably, the
resulting algorithm does not require discretization of states or actions.
Therefore, the optimal adversarial perturbations can be efficiently
incorporated in the min-max value function update. We apply the resulting
algorithm to the physical Furuta pendulum and cartpole. By changing the masses
of the systems we evaluate the quantitative and qualitative performance across
different model parameters. We show that robust value iteration is more robust
compared to deep reinforcement learning algorithm and the non-robust version of
the algorithm. Videos of the experiments are shown at
https://sites.google.com/view/rfvi
Related papers
- Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Topological Guided Actor-Critic Modular Learning of Continuous Systems
with Temporal Objectives [2.398608007786179]
This work investigates the formal policy synthesis of continuous-state dynamic systems given high-level specifications in linear temporal logic.
We use neural networks to approximate the value function and policy function for hybrid product state space.
arXiv Detail & Related papers (2023-04-20T01:36:05Z) - Smoothing Policy Iteration for Zero-sum Markov Games [9.158672246275348]
We propose the smoothing policy robustness (SPI) algorithm to solve the zero-sum MGs approximately.
Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.
We also propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations.
arXiv Detail & Related papers (2022-12-03T14:39:06Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions.
The optimal policy can be derived for non-linear control-affine dynamics.
Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.