Related papers: Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

URL: http://arxiv.org/abs/2304.03907v3
Date: Thu, 21 Dec 2023 01:25:28 GMT
Title: Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding
Authors: Tongzheng Ren, Zhaolin Ren, Haitong Ma, Na Li and Bo Dai
Abstract summary: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear systems. We use an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation.
Score: 22.946517604055735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems. This method leverages an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation. To characterize the effectiveness of these finite dimensional approximations, we provide an in-depth theoretical analysis to characterize the approximation error induced by the finite-dimension truncation and statistical error induced by finite-sample approximation in both policy evaluation and policy optimization. Our analysis includes two prominent kernel approximation methods: truncations onto random features and Nystrom features. We also empirically test the algorithm and compare the performance with Koopman-based, iLQR, and energy-based methods on a few benchmark problems.

Related papers

Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems. Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control. We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
A Structure-Preserving Kernel Method for Learning Hamiltonian Systems [3.594638299627404]
A structure-preserving kernel ridge regression method is presented that allows the recovery of potentially high-dimensional and nonlinear Hamiltonian functions. The paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required. A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters.
arXiv Detail & Related papers (2024-03-15T07:20:21Z)
Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty. We propose minimax optimal and computationally efficient algorithms realizing function approximation. Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z)
Auxiliary Functions as Koopman Observables: Data-Driven Analysis of Dynamical Systems via Polynomial Optimization [0.0]
We present a flexible data-driven method for system analysis that does not require explicit model discovery. The method is rooted in well-established techniques for approxing the Koopman operator from data and is implemented as a semidefinite program that can be solved numerically.
arXiv Detail & Related papers (2023-03-02T18:44:18Z)
Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency [59.48096489854697]
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures. We prove non-asymptotic upper bounds on the mean-squared error of such procedures. We establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds.
arXiv Detail & Related papers (2022-09-26T23:50:55Z)
Whiplash Gradient Descent Dynamics [2.0508733018954843]
We introduce the symplectic convergence analysis for the Whiplash system for convex functions. We study the algorithm's performance for various costs and provide a practical methodology for analyzing convergence rates.
arXiv Detail & Related papers (2022-03-04T05:47:26Z)
Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic. We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds. We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model. We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.