Related papers: Robust Value Iteration for Continuous Control Tasks

Robust Value Iteration for Continuous Control Tasks

URL: http://arxiv.org/abs/2105.12189v1
Date: Tue, 25 May 2021 19:48:35 GMT
Title: Robust Value Iteration for Continuous Control Tasks
Authors: Michael Lutter and Shie Mannor and Jan Peters and Dieter Fox and Animesh Garg
Abstract summary: When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
Score: 99.00362538261972
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding state-distribution, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics. Utilizing the continuous-time perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. Notably, the resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

Related papers

Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence [8.65436459753278]
We introduce a continuous policy iteration algorithm where the approximations of the value function of a control problem and the optimal control are simultaneously updated through Langevin-type dynamics.
arXiv Detail & Related papers (2025-06-09T18:20:21Z)
Data-Assimilated Model-Based Reinforcement Learning for Partially Observed Chaotic Flows [3.7960472831772765]
We propose a data-assimilated model-based RL (DA-MBRL) framework for systems with partial observability and noisy measurements. An off-policy actor-critic algorithm is employed to learn optimal control strategies from state estimates. The framework is tested on the Kuramoto-Sivainskysh equation, demonstrating its effectiveness in stabilizing atemporally chaotic flow.
arXiv Detail & Related papers (2025-04-23T10:12:53Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law. We approach both objectives by using reinforcement learning to compute the optimal control law. Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z)
Topological Guided Actor-Critic Modular Learning of Continuous Systems with Temporal Objectives [2.398608007786179]
This work investigates the formal policy synthesis of continuous-state dynamic systems given high-level specifications in linear temporal logic. We use neural networks to approximate the value function and policy function for hybrid product state space.
arXiv Detail & Related papers (2023-04-20T01:36:05Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z)
Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions. The optimal policy can be derived for non-linear control-affine dynamics. Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z)
Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm. The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.