Operator Splitting Value Iteration
- URL: http://arxiv.org/abs/2211.13937v1
- Date: Fri, 25 Nov 2022 07:34:26 GMT
- Title: Operator Splitting Value Iteration
- Authors: Amin Rakhsha, Andrew Wang, Mohammad Ghavamzadeh, Amir-massoud
Farahmand
- Abstract summary: We introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems.
OS-VI achieves a much faster convergence rate when the model is accurate enough.
Unlike the traditional Dyna architecture, OS-Dyna still converges to the correct value function in presence of model approximation error.
- Score: 27.505231431328255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce new planning and reinforcement learning algorithms for
discounted MDPs that utilize an approximate model of the environment to
accelerate the convergence of the value function. Inspired by the splitting
approach in numerical linear algebra, we introduce Operator Splitting Value
Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI
achieves a much faster convergence rate when the model is accurate enough. We
also introduce a sample-based version of the algorithm called OS-Dyna. Unlike
the traditional Dyna architecture, OS-Dyna still converges to the correct value
function in presence of model approximation error.
Related papers
- Fusing Dictionary Learning and Support Vector Machines for Unsupervised Anomaly Detection [1.5999407512883508]
We introduce a new anomaly detection model that unifies the OC-SVM and DL residual functions into a single composite objective.
We extend both objectives to the more general setting that allows the use of kernel functions.
arXiv Detail & Related papers (2024-04-05T12:41:53Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Adaptive operator learning for infinite-dimensional Bayesian inverse problems [7.716833952167609]
We develop an adaptive operator learning framework that can reduce modeling error gradually by forcing the surrogate to be accurate in local areas.
We present a rigorous convergence guarantee in the linear case using the UKI framework.
The numerical results show that our method can significantly reduce computational costs while maintaining inversion accuracy.
arXiv Detail & Related papers (2023-10-27T01:50:33Z) - Model Predictive Control with Self-supervised Representation Learning [13.225264876433528]
We propose the use of a reconstruction function within the TD-MPC framework, so that the agent can reconstruct the original observation.
Our proposed addition of another loss term leads to improved performance on both state- and image-based tasks.
arXiv Detail & Related papers (2023-04-14T16:02:04Z) - A DeepONet multi-fidelity approach for residual learning in reduced
order modeling [0.0]
We introduce a novel approach to enhance the precision of reduced order models by exploiting a multi-fidelity perspective and DeepONets.
We propose to couple the model reduction to a machine learning residual learning, such that the above-mentioned error can be learned by a neural network and inferred for new predictions.
arXiv Detail & Related papers (2023-02-24T15:15:07Z) - Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models.
It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z) - A Stochastic Bundle Method for Interpolating Networks [18.313879914379008]
We propose a novel method for training deep neural networks that are capable of driving the empirical loss to zero.
At each iteration our method constructs a maximum linear approximation, known as the bundle of the objective learning approximation.
arXiv Detail & Related papers (2022-01-29T23:02:30Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.