Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding
- URL: http://arxiv.org/abs/2304.03907v4
- Date: Fri, 15 Nov 2024 20:51:08 GMT
- Title: Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding
- Authors: Zhaolin Ren, Tongzheng Ren, Haitong Ma, Na Li, Bo Dai,
- Abstract summary: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear systems.
We use an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation.
- Score: 21.38845517949153
- License:
- Abstract: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems. This method leverages an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation. To characterize the effectiveness of these finite dimensional approximations, we provide an in-depth theoretical analysis to characterize the approximation error induced by the finite-dimension truncation and statistical error induced by finite-sample approximation in both policy evaluation and policy optimization. Our analysis includes two prominent kernel approximation methods: truncations onto random features and Nystrom features. We also empirically test the algorithm and compare the performance with Koopman-based, iLQR, and energy-based methods on a few benchmark problems.
Related papers
- WENDy for Nonlinear-in-Parameters ODEs [1.9573380763700712]
The Weak-form Estimation of Non-linear Dynamics (WEN) is extended to accommodate systems of ordinary differential equations that are nonlinear-ins.
We present results on a suite of benchmark systems to demonstrate the practical benefits of our approach.
arXiv Detail & Related papers (2025-02-13T01:40:21Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.
We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - A Structure-Preserving Kernel Method for Learning Hamiltonian Systems [3.594638299627404]
A structure-preserving kernel ridge regression method is presented that allows the recovery of potentially high-dimensional and nonlinear Hamiltonian functions.
The paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required.
A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters.
arXiv Detail & Related papers (2024-03-15T07:20:21Z) - Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty.
We propose minimax optimal and computationally efficient algorithms realizing function approximation.
Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z) - Auxiliary Functions as Koopman Observables: Data-Driven Analysis of
Dynamical Systems via Polynomial Optimization [0.0]
We present a flexible data-driven method for system analysis that does not require explicit model discovery.
The method is rooted in well-established techniques for approxing the Koopman operator from data and is implemented as a semidefinite program that can be solved numerically.
arXiv Detail & Related papers (2023-03-02T18:44:18Z) - Off-policy estimation of linear functionals: Non-asymptotic theory for
semi-parametric efficiency [59.48096489854697]
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.
We prove non-asymptotic upper bounds on the mean-squared error of such procedures.
We establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds.
arXiv Detail & Related papers (2022-09-26T23:50:55Z) - Whiplash Gradient Descent Dynamics [2.0508733018954843]
We introduce the symplectic convergence analysis for the Whiplash system for convex functions.
We study the algorithm's performance for various costs and provide a practical methodology for analyzing convergence rates.
arXiv Detail & Related papers (2022-03-04T05:47:26Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.