Related papers: On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances

On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances

URL: http://arxiv.org/abs/2509.16746v2
Date: Thu, 25 Sep 2025 04:03:42 GMT
Title: On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances
Authors: Sayak Mukherjee, Ramij R. Hossain, Mahantesh Halappanavar,
Abstract summary: We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances.<n>Our approach builds on the fundamental learning-based framework of adaptive dynamic programming.
Score: 3.701656361145375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances. First, we consider the scenario where the exogenous variable can be estimated in a controlled environment, and subsequently, consider a more practical and challenging scenario where it is unknown in a stochastic setting. Our approach builds on the fundamental learning-based framework of adaptive dynamic programming (ADP), combined with a Lyapunov-based analytical methodology to design the algorithms and derive sample-based approximations motivated from the Markov decision process (MDP)-based approaches. For the scenario involving non-measurable disturbances, we further establish stability and convergence guarantees for the learned control gains under sample-based approximations. The overall methodology emphasizes simplicity while providing rigorous guarantees. Finally, numerical experiments focus on the intricacies and validations for the design of offline continuous-time LQR with exogenous disturbances.

Related papers

Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z)
The Confusing Instance Principle for Online Linear Quadratic Control [6.896797484250302]
We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning.<n>We propose an alternative based on the Confusing Instance (CI) principle, which underpins regret lower bounds in MABs and discrete Decision Processes.<n>By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop MED-LQ. This novel control strategy extends the principles of CI and MED beyond small-scale settings.
arXiv Detail & Related papers (2025-10-22T12:38:42Z)
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z)
Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation [50.34670342434884]
We propose a novel methodology for modeling posterior drift through Bayes decision rules.<n>Under mild regularity conditions, we establish the consistency of our estimators and derive the risk bounds.<n>We illustrate the broad applicability of our method by adapting it to the estimation of optimal individualized treatment rules.
arXiv Detail & Related papers (2025-08-28T16:03:06Z)
Instance-Dependent Continuous-Time Reinforcement Learning via Maximum Likelihood Estimation [27.232790785138427]
Continuous-time reinforcement learning (CTRL) provides a natural framework for sequential decision-making in dynamic environments.<n>While has shown growing empirical success, its ability to adapt to varying levels of problem difficulty remains poorly understood.<n>In this work, we investigate the instance-dependent behavior of and introduce a simple, model-based algorithm built on maximum likelihood estimation.
arXiv Detail & Related papers (2025-08-04T06:25:45Z)
Sample Complexity of Linear Quadratic Regulator Without Initial Stability [11.98212766542468]
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown parameters.<n>Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity.
arXiv Detail & Related papers (2025-02-20T02:44:25Z)
Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian Score Climbing [3.9410617513331863]
optimal control of dynamical systems is a crucial challenge in sequential decision-making. Control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma. This paper introduces a novel perspective by framing risk-sensitive control as Markovian reinforcement score climbing under samples drawn from a conditional particle filter.
arXiv Detail & Related papers (2023-12-21T16:34:03Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Distributionally Robust Model-Based Offline Reinforcement Learning with Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration. Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset. We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z)
Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes. A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z)
Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z)
COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable. We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions. We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity [15.940861063732608]
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL) We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller. One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
arXiv Detail & Related papers (2021-01-04T16:00:46Z)
Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.