Related papers: Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application

Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application

URL: http://arxiv.org/abs/2412.05906v2
Date: Tue, 04 Feb 2025 10:22:04 GMT
Title: Reinforcement Learning for a Discrete-Time Linear-Quadratic Control Problem with an Application
Authors: Lucky Li,
Abstract summary: We study the discrete-time linear-quadratic (LQ) control model using reinforcement learning (RL)<n>Using entropy to measure the cost of exploration, we prove that the optimal feedback policy for the problem must be Gaussian type.<n>Then, we apply the results of the discrete-time LQ model to solve the discrete-time mean-variance asset-liability management problem and prove our RL algorithm's policy improvement and convergence.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the discrete-time linear-quadratic (LQ) control model using reinforcement learning (RL). Using entropy to measure the cost of exploration, we prove that the optimal feedback policy for the problem must be Gaussian type. Then, we apply the results of the discrete-time LQ model to solve the discrete-time mean-variance asset-liability management problem and prove our RL algorithm's policy improvement and convergence. Finally, a numerical example sheds light on the theoretical results established using simulations.

Related papers

Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems [6.859965454961918]
We study reinforcement learning for continuous-time linear-quadratic (LQ) control problems.<n>We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic.<n>Our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems.
arXiv Detail & Related papers (2025-07-01T01:09:06Z)
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control. We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z)
Full error analysis of policy gradient learning algorithms for exploratory linear quadratic mean-field control problem in continuous time with common noise [0.0]
We study policy gradient (PG) learning and first demonstrate convergence in a model-based setting. We prove the global linear convergence and sample complexity of the PG algorithm with two-point gradient estimates in a model-free setting. In this setting, the parameterized optimal policies are learned from samples of the states and population distribution.
arXiv Detail & Related papers (2024-08-05T14:11:51Z)
Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z)
Continuous-Time Model-Based Reinforcement Learning [4.427447378048202]
We propose a continuous-time MBRL framework based on a novel actor-critic method. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems.
arXiv Detail & Related papers (2021-02-09T11:30:19Z)
Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation [23.76925146112261]
This paper studies the robustness of reinforcement learning algorithms to errors in the learning process. It is shown that policy iteration for LQR is inherently robust to small errors in the learning process.
arXiv Detail & Related papers (2020-08-25T11:11:28Z)
A spectral algorithm for robust regression with subgaussian rates [0.0]
We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples. The goal is to design a procedure which attains the optimal sub-gaussian error bound even though the data have only finite moments.
arXiv Detail & Related papers (2020-07-12T19:33:50Z)
Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.