On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances
- URL: http://arxiv.org/abs/2509.16746v2
- Date: Thu, 25 Sep 2025 04:03:42 GMT
- Title: On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances
- Authors: Sayak Mukherjee, Ramij R. Hossain, Mahantesh Halappanavar,
- Abstract summary: We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances.<n>Our approach builds on the fundamental learning-based framework of adaptive dynamic programming.
- Score: 3.701656361145375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances. First, we consider the scenario where the exogenous variable can be estimated in a controlled environment, and subsequently, consider a more practical and challenging scenario where it is unknown in a stochastic setting. Our approach builds on the fundamental learning-based framework of adaptive dynamic programming (ADP), combined with a Lyapunov-based analytical methodology to design the algorithms and derive sample-based approximations motivated from the Markov decision process (MDP)-based approaches. For the scenario involving non-measurable disturbances, we further establish stability and convergence guarantees for the learned control gains under sample-based approximations. The overall methodology emphasizes simplicity while providing rigorous guarantees. Finally, numerical experiments focus on the intricacies and validations for the design of offline continuous-time LQR with exogenous disturbances.
Related papers
- Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - The Confusing Instance Principle for Online Linear Quadratic Control [6.896797484250302]
We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning.<n>We propose an alternative based on the Confusing Instance (CI) principle, which underpins regret lower bounds in MABs and discrete Decision Processes.<n>By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop MED-LQ. This novel control strategy extends the principles of CI and MED beyond small-scale settings.
arXiv Detail & Related papers (2025-10-22T12:38:42Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation [50.34670342434884]
We propose a novel methodology for modeling posterior drift through Bayes decision rules.<n>Under mild regularity conditions, we establish the consistency of our estimators and derive the risk bounds.<n>We illustrate the broad applicability of our method by adapting it to the estimation of optimal individualized treatment rules.
arXiv Detail & Related papers (2025-08-28T16:03:06Z) - Instance-Dependent Continuous-Time Reinforcement Learning via Maximum Likelihood Estimation [27.232790785138427]
Continuous-time reinforcement learning (CTRL) provides a natural framework for sequential decision-making in dynamic environments.<n>While has shown growing empirical success, its ability to adapt to varying levels of problem difficulty remains poorly understood.<n>In this work, we investigate the instance-dependent behavior of and introduce a simple, model-based algorithm built on maximum likelihood estimation.
arXiv Detail & Related papers (2025-08-04T06:25:45Z) - Sample Complexity of Linear Quadratic Regulator Without Initial Stability [11.98212766542468]
Inspired by REINFORCE, we introduce a novel receding-horizon algorithm for the Linear Quadratic Regulator (LQR) problem with unknown parameters.<n>Unlike prior methods, our algorithm avoids reliance on two-point gradient estimates while maintaining the same order of sample complexity.
arXiv Detail & Related papers (2025-02-20T02:44:25Z) - Risk-Sensitive Stochastic Optimal Control as Rao-Blackwellized Markovian
Score Climbing [3.9410617513331863]
optimal control of dynamical systems is a crucial challenge in sequential decision-making.
Control-as-inference approaches have had considerable success, providing a viable risk-sensitive framework to address the exploration-exploitation dilemma.
This paper introduces a novel perspective by framing risk-sensitive control as Markovian reinforcement score climbing under samples drawn from a conditional particle filter.
arXiv Detail & Related papers (2023-12-21T16:34:03Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design.
We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Derivative-Free Policy Optimization for Risk-Sensitive and Robust
Control Design: Implicit Regularization and Sample Complexity [15.940861063732608]
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL)
We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller.
One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
arXiv Detail & Related papers (2021-01-04T16:00:46Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.