Value Iteration in Continuous Actions, States and Time
- URL: http://arxiv.org/abs/2105.04682v1
- Date: Mon, 10 May 2021 21:40:56 GMT
- Title: Value Iteration in Continuous Actions, States and Time
- Authors: Michael Lutter and Shie Mannor and Jan Peters and Dieter Fox and
Animesh Garg
- Abstract summary: We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions.
The optimal policy can be derived for non-linear control-affine dynamics.
Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
- Score: 99.00362538261972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical value iteration approaches are not applicable to environments with
continuous states and actions. For such environments, the states and actions
are usually discretized, which leads to an exponential increase in
computational complexity. In this paper, we propose continuous fitted value
iteration (cFVI). This algorithm enables dynamic programming for continuous
states and actions with a known dynamics model. Leveraging the continuous-time
formulation, the optimal policy can be derived for non-linear control-affine
dynamics. This closed-form solution enables the efficient extension of value
iteration to continuous environments. We show in non-linear control experiments
that the dynamic programming solution obtains the same quantitative performance
as deep reinforcement learning methods in simulation but excels when
transferred to the physical system. The policy obtained by cFVI is more robust
to changes in the dynamics despite using only a deterministic model and without
explicitly incorporating robustness in the optimization. Videos of the physical
system are available at \url{https://sites.google.com/view/value-iteration}.
Related papers
- Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series [14.400596021890863]
Many real-world datasets, such as healthcare, climate, and economics, are often collected as irregular time series.
We propose the Amortized Control of continuous State Space Model (ACSSM) for continuous dynamical modeling of time series.
arXiv Detail & Related papers (2024-10-08T01:27:46Z) - Neural ODEs as Feedback Policies for Nonlinear Optimal Control [1.8514606155611764]
We use Neural ordinary differential equations (Neural ODEs) to model continuous time dynamics as differential equations parametrized with neural networks.
We propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems.
arXiv Detail & Related papers (2022-10-20T13:19:26Z) - Accelerated Continuous-Time Approximate Dynamic Programming via
Data-Assisted Hybrid Control [0.0]
We introduce an algorithm that incorporates dynamic momentum in actor-critic structures to control continuous-time dynamic plants with an affine structure in the input.
By incorporating dynamic momentum in our algorithm, we are able to accelerate the convergence properties of the closed-loop system.
arXiv Detail & Related papers (2022-04-27T05:36:51Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.