Low-rank State-action Value-function Approximation
- URL: http://arxiv.org/abs/2104.08805v1
- Date: Sun, 18 Apr 2021 10:31:39 GMT
- Title: Low-rank State-action Value-function Approximation
- Authors: Sergio Rozada, Victor Tenorio, and Antonio G. Marques
- Abstract summary: Several high-dimensional state problems can be well-approximated by an intrinsic low-rank structure.
This paper proposes different algorithms to estimate a low-rank factorization of the $Q(s, a)$ matrix.
- Score: 11.026561518386025
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Value functions are central to Dynamic Programming and Reinforcement Learning
but their exact estimation suffers from the curse of dimensionality,
challenging the development of practical value-function (VF) estimation
algorithms. Several approaches have been proposed to overcome this issue, from
non-parametric schemes that aggregate states or actions to parametric
approximations of state and action VFs via, e.g., linear estimators or deep
neural networks. Relevantly, several high-dimensional state problems can be
well-approximated by an intrinsic low-rank structure. Motivated by this and
leveraging results from low-rank optimization, this paper proposes different
stochastic algorithms to estimate a low-rank factorization of the $Q(s, a)$
matrix. This is a non-parametric alternative to VF approximation that
dramatically reduces the computational and sample complexities relative to
classical $Q$-learning methods that estimate $Q(s,a)$ separately for each
state-action pair.
Related papers
- Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning [11.317136648551536]
Value-function approximation is a central problem in Reinforcement Learning (RL)
This paper puts forth a parsimonious non-parametric approach, where we use low-rank algorithms to estimate the VF matrix in an online and model-free fashion.
As VFs tend to be multi-dimensional, we propose replacing the classical VF matrix representation with a tensor representation and, then, use the PARAFAC decomposition to design an online model-free tensor low-rank algorithm.
arXiv Detail & Related papers (2022-01-21T00:13:54Z) - Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation
in Contaminated Gaussian Models [1.609950046042424]
Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model.
First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator.
We develop tractable adversarial algorithms with simple spline discriminators, which can be implemented via nested optimization.
The proposed methods are shown to achieve minimax optimal rates or near-optimal rates depending on the $f$-divergence and the penalty used.
arXiv Detail & Related papers (2021-12-24T02:46:51Z) - Selective Multiple Power Iteration: from Tensor PCA to gradient-based
exploration of landscapes [7.648784748888189]
Selective Multiple Power Iterations (SMPI) is a new algorithm to address the important problem that consists in recovering a spike.
We show that these unexpected performances are due to a powerful mechanism in which the noise plays a key role for the signal recovery.
These results may have strong impact on both practical and theoretical applications.
arXiv Detail & Related papers (2021-12-23T01:46:41Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - On the Effectiveness of Richardson Extrapolation in Machine Learning [0.0]
Richardson is a technique from numerical analysis that can improve the approximation error of an estimation method.
We show that Richardson extrapolation techniques come with no significant loss in performance, but with sometimes strong gains.
arXiv Detail & Related papers (2020-02-07T15:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.