Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive
Control
- URL: http://arxiv.org/abs/2108.10315v1
- Date: Fri, 20 Aug 2021 19:17:35 GMT
- Title: Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive
Control
- Authors: Dimitri Bertsekas
- Abstract summary: We show that the principal AlphaZero/TDGammon ideas of approximation in value space and rollout apply very broadly to deterministic and optimal control problems.
These ideas can be effectively integrated with other important methodologies such as model control, adaptive control, decentralized control, and neural network-based value and policy approximations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper we aim to provide analysis and insights (often based on
visualization), which explain the beneficial effects of on-line decision making
on top of off-line training. In particular, through a unifying abstract
mathematical framework, we show that the principal AlphaZero/TD-Gammon ideas of
approximation in value space and rollout apply very broadly to deterministic
and stochastic optimal control problems, involving both discrete and continuous
search spaces. Moreover, these ideas can be effectively integrated with other
important methodologies such as model predictive control, adaptive control,
decentralized control, discrete and Bayesian optimization, neural network-based
value and policy approximations, and heuristic algorithms for discrete
optimization.
Related papers
- A successive approximation method in functional spaces for hierarchical optimal control problems and its application to learning [0.0]
We consider a class of learning problem of point estimation for modeling high-dimensional nonlinear functions.
The estimated parameter in due course provides an acceptable prediction accuracy on a different model validation dataset.
We provide a framework how to account appropriately for both generalization and regularization at the optimization stage.
arXiv Detail & Related papers (2024-10-27T22:28:07Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - Offline Supervised Learning V.S. Online Direct Policy Optimization: A Comparative Study and A Unified Training Paradigm for Neural Network-Based Optimal Feedback Control [7.242569453287703]
We first conduct a comparative study of two prevalent approaches: offline supervised learning and online direct policy optimization.
Our results underscore the superiority of offline supervised learning in terms of both optimality and training time.
We propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control.
arXiv Detail & Related papers (2022-11-29T05:07:13Z) - Introduction to Online Nonstochastic Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.
The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems.
Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation.
We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z) - Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing.
Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.