Related papers: Continuous-Time Fitted Value Iteration for Robust Policies

Continuous-Time Fitted Value Iteration for Robust Policies

URL: http://arxiv.org/abs/2110.01954v1
Date: Tue, 5 Oct 2021 11:33:37 GMT
Title: Continuous-Time Fitted Value Iteration for Robust Policies
Authors: Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters
Abstract summary: Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
Score: 93.25997466553929
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

Related papers

Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence [8.65436459753278]
We introduce a continuous policy iteration algorithm where the approximations of the value function of a control problem and the optimal control are simultaneously updated through Langevin-type dynamics.
arXiv Detail & Related papers (2025-06-09T18:20:21Z)
Neural Time-Reversed Generalized Riccati Equation [60.92253836775246]
Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates. This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time.
arXiv Detail & Related papers (2023-12-14T19:29:37Z)
Solving Robust MDPs through No-Regret Dynamics [1.3597551064547502]
Reinforcement Learning is a powerful framework for training agents to navigate different situations. We show how an algorithm can be used to improve policy training methods.
arXiv Detail & Related papers (2023-05-30T13:52:16Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Implicitly Regularized RL with Implicit Q-Values [42.87920755961722]
The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy. We propose to parametrize the $Q$-function implicitly, as the sum of a log-policy and of a value function. We derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the $Q$-value.
arXiv Detail & Related papers (2021-08-16T12:20:47Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions. The optimal policy can be derived for non-linear control-affine dynamics. Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z)
Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm. The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.