Continuous-Time Fitted Value Iteration for Robust Policies
- URL: http://arxiv.org/abs/2110.01954v1
- Date: Tue, 5 Oct 2021 11:33:37 GMT
- Title: Continuous-Time Fitted Value Iteration for Robust Policies
- Authors: Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg,
Jan Peters
- Abstract summary: Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
- Score: 93.25997466553929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Solving the Hamilton-Jacobi-Bellman equation is important in many domains
including control, robotics and economics. Especially for continuous control,
solving this differential equation and its extension the Hamilton-Jacobi-Isaacs
equation, is important as it yields the optimal policy that achieves the
maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs
equation, which includes an adversary controlling the environment and
minimizing the reward, the obtained policy is also robust to perturbations of
the dynamics. In this paper we propose continuous fitted value iteration (cFVI)
and robust fitted value iteration (rFVI). These algorithms leverage the
non-linear control-affine dynamics and separable state and action reward of
many continuous control problems to derive the optimal policy and optimal
adversary in closed form. This analytic expression simplifies the differential
equations and enables us to solve for the optimal value function using value
iteration for continuous actions and states as well as the adversarial case.
Notably, the resulting algorithms do not require discretization of states or
actions. We apply the resulting algorithms to the Furuta pendulum and cartpole.
We show that both algorithms obtain the optimal policy. The robustness Sim2Real
experiments on the physical systems show that the policies successfully achieve
the task in the real-world. When changing the masses of the pendulum, we
observe that robust value iteration is more robust compared to deep
reinforcement learning algorithm and the non-robust version of the algorithm.
Videos of the experiments are shown at https://sites.google.com/view/rfvi
Related papers
- Neural Time-Reversed Generalized Riccati Equation [60.92253836775246]
Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates.
This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time.
arXiv Detail & Related papers (2023-12-14T19:29:37Z) - Solving Robust MDPs through No-Regret Dynamics [1.3597551064547502]
Reinforcement Learning is a powerful framework for training agents to navigate different situations.
We show how an algorithm can be used to improve policy training methods.
arXiv Detail & Related papers (2023-05-30T13:52:16Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Implicitly Regularized RL with Implicit Q-Values [42.87920755961722]
The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy.
We propose to parametrize the $Q$-function implicitly, as the sum of a log-policy and of a value function.
We derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the $Q$-value.
arXiv Detail & Related papers (2021-08-16T12:20:47Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions.
The optimal policy can be derived for non-linear control-affine dynamics.
Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.