Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic
Embedding
- URL: http://arxiv.org/abs/2304.03907v3
- Date: Thu, 21 Dec 2023 01:25:28 GMT
- Title: Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic
Embedding
- Authors: Tongzheng Ren, Zhaolin Ren, Haitong Ma, Na Li and Bo Dai
- Abstract summary: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear systems.
We use an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation.
- Score: 22.946517604055735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC),
to optimal control for nonlinear stochastic systems. This method leverages an
infinite-dimensional feature to linearly represent the state-action value
function and exploits finite-dimensional truncation approximation for practical
implementation. To characterize the effectiveness of these finite dimensional
approximations, we provide an in-depth theoretical analysis to characterize the
approximation error induced by the finite-dimension truncation and statistical
error induced by finite-sample approximation in both policy evaluation and
policy optimization. Our analysis includes two prominent kernel approximation
methods: truncations onto random features and Nystrom features. We also
empirically test the algorithm and compare the performance with Koopman-based,
iLQR, and energy-based methods on a few benchmark problems.
Related papers
- Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems.
Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control.
We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - A Structure-Preserving Kernel Method for Learning Hamiltonian Systems [3.594638299627404]
A structure-preserving kernel ridge regression method is presented that allows the recovery of potentially high-dimensional and nonlinear Hamiltonian functions.
The paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required.
A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters.
arXiv Detail & Related papers (2024-03-15T07:20:21Z) - Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty.
We propose minimax optimal and computationally efficient algorithms realizing function approximation.
Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z) - Auxiliary Functions as Koopman Observables: Data-Driven Analysis of
Dynamical Systems via Polynomial Optimization [0.0]
We present a flexible data-driven method for system analysis that does not require explicit model discovery.
The method is rooted in well-established techniques for approxing the Koopman operator from data and is implemented as a semidefinite program that can be solved numerically.
arXiv Detail & Related papers (2023-03-02T18:44:18Z) - Off-policy estimation of linear functionals: Non-asymptotic theory for
semi-parametric efficiency [59.48096489854697]
The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures.
We prove non-asymptotic upper bounds on the mean-squared error of such procedures.
We establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds.
arXiv Detail & Related papers (2022-09-26T23:50:55Z) - Whiplash Gradient Descent Dynamics [2.0508733018954843]
We introduce the symplectic convergence analysis for the Whiplash system for convex functions.
We study the algorithm's performance for various costs and provide a practical methodology for analyzing convergence rates.
arXiv Detail & Related papers (2022-03-04T05:47:26Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.