Reinforcement Learning for Adaptive Optimal Stationary Control of Linear
Stochastic Systems
- URL: http://arxiv.org/abs/2107.07788v2
- Date: Tue, 20 Jul 2021 03:47:32 GMT
- Title: Reinforcement Learning for Adaptive Optimal Stationary Control of Linear
Stochastic Systems
- Authors: Bo Pang and Zhong-Ping Jiang
- Abstract summary: This paper studies the adaptive optimal stationary control of continuous-time linear systems with both additive and multiplicative noises.
A novel off-policy reinforcement learning algorithm, named optimistic least-squares-based iteration policy, is proposed.
- Score: 15.410124023805249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the adaptive optimal stationary control of continuous-time
linear stochastic systems with both additive and multiplicative noises, using
reinforcement learning techniques. Based on policy iteration, a novel
off-policy reinforcement learning algorithm, named optimistic
least-squares-based policy iteration, is proposed which is able to iteratively
find near-optimal policies of the adaptive optimal stationary control problem
directly from input/state data without explicitly identifying any system
matrices, starting from an initial admissible control policy. The solutions
given by the proposed optimistic least-squares-based policy iteration are
proved to converge to a small neighborhood of the optimal solution with
probability one, under mild conditions. The application of the proposed
algorithm to a triple inverted pendulum example validates its feasibility and
effectiveness.
Related papers
- Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Fast Policy Learning for Linear Quadratic Control with Entropy
Regularization [10.771650397337366]
This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic control (LQC) problems.
Assuming access to the exact policy evaluation, both proposed approaches are proven to converge linearly in finding optimal policies of the regularized LQC.
arXiv Detail & Related papers (2023-11-23T19:08:39Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Introduction to Online Nonstochastic Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.
The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Recurrent Model Predictive Control [19.047059454849897]
We propose an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.
Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs.
arXiv Detail & Related papers (2021-02-23T15:01:36Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Approximate Midpoint Policy Iteration for Linear Quadratic Control [1.0312968200748118]
We present a midpoint policy iteration algorithm to solve linear quadratic optimal control problems in both model-based and model-free settings.
We show that in the model-based setting it achieves cubic convergence, which is superior to standard policy iteration and policy algorithms that achieve quadratic and linear convergence.
arXiv Detail & Related papers (2020-11-28T20:22:10Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Model-free optimal control of discrete-time systems with additive and
multiplicative noises [1.656520517245166]
This paper investigates the optimal control problem for a class of discrete-time systems subject to additive and multiplicative noises.
A modelfree reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs.
It is proven that the learning algorithm converges to the optimal admissible control policy.
arXiv Detail & Related papers (2020-08-20T02:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.