Efficient Global Planning in Large MDPs via Stochastic Primal-Dual
Optimization
- URL: http://arxiv.org/abs/2210.12057v1
- Date: Fri, 21 Oct 2022 15:49:20 GMT
- Title: Efficient Global Planning in Large MDPs via Stochastic Primal-Dual
Optimization
- Authors: Gergely Neu, Nneka Okolo
- Abstract summary: We show that our method outputs a near-optimal policy after a number of queries to the generative model.
Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector.
- Score: 12.411844611718958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new stochastic primal-dual optimization algorithm for planning
in a large discounted Markov decision process with a generative model and
linear function approximation. Assuming that the feature map approximately
satisfies standard realizability and Bellman-closedness conditions and also
that the feature vectors of all state-action pairs are representable as convex
combinations of a small core set of state-action pairs, we show that our method
outputs a near-optimal policy after a polynomial number of queries to the
generative model. Our method is computationally efficient and comes with the
major advantage that it outputs a single softmax policy that is compactly
represented by a low-dimensional parameter vector, and does not need to execute
computationally expensive local planning subroutines in runtime.
Related papers
- Global Optimization of Gaussian Process Acquisition Functions Using a Piecewise-Linear Kernel Approximation [2.3342885570554652]
We introduce a piecewise approximation for process kernels and a corresponding MIQP representation for acquisition functions.
We empirically demonstrate the framework on synthetic functions, constrained benchmarks, and hyper tuning tasks.
arXiv Detail & Related papers (2024-10-22T10:56:52Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Self-concordant Smoothing for Large-Scale Convex Composite Optimization [0.0]
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth.
We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm.
arXiv Detail & Related papers (2023-09-04T19:47:04Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Efficient Planning in Large MDPs with Weak Linear Function Approximation [4.56877715768796]
Large-scale decision processes (MDPs) require planning algorithms independent of the number of states of the MDP.
We consider the planning problem in MDPs using linear value function approximation with only weak requirements.
arXiv Detail & Related papers (2020-07-13T04:40:41Z) - Kernel Taylor-Based Value Function Approximation for Continuous-State
Markov Decision Processes [5.894659354028797]
We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs)
We have validated the proposed method through extensive simulations in both simplified and realistic planning scenarios.
arXiv Detail & Related papers (2020-06-03T01:48:43Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.