Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation
- URL: http://arxiv.org/abs/2006.06135v1
- Date: Thu, 11 Jun 2020 00:55:35 GMT
- Title: Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation
- Authors: Devavrat Shah, Dogyoon Song, Zhi Xu, Yuzhe Yang
- Abstract summary: We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces.
We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
- Score: 30.137884459159107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the question of learning $Q$-function in a sample efficient
manner for reinforcement learning with continuous state and action spaces under
a generative model. If $Q$-function is Lipschitz continuous, then the minimal
sample complexity for estimating $\epsilon$-optimal $Q$-function is known to
scale as ${\Omega}(\frac{1}{\epsilon^{d_1+d_2 +2}})$ per classical
non-parametric learning theory, where $d_1$ and $d_2$ denote the dimensions of
the state and action spaces respectively. The $Q$-function, when viewed as a
kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable
spectrum. This motivates us to consider a parametric class of $Q$-functions
parameterized by its "rank" $r$, which contains all Lipschitz $Q$-functions as
$r \to \infty$. As our key contribution, we develop a simple, iterative
learning algorithm that finds $\epsilon$-optimal $Q$-function with sample
complexity of $\widetilde{O}(\frac{1}{\epsilon^{\max(d_1, d_2)+2}})$ when the
optimal $Q$-function has low rank $r$ and the discounting factor $\gamma$ is
below a certain threshold. Thus, this provides an exponential improvement in
sample complexity. To enable our result, we develop a novel Matrix Estimation
algorithm that faithfully estimates an unknown low-rank matrix in the
$\ell_\infty$ sense even in the presence of arbitrary bounded noise, which
might be of interest in its own right. Empirical results on several stochastic
control tasks confirm the efficacy of our "low-rank" algorithms.
Related papers
- Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
We study the computational complexity of learning a target function $f_*:mathbbRdtomathbbR$ with additive structure.
We prove that a large subset of $f_*$ can be efficiently learned by gradient training of a two-layer neural network.
arXiv Detail & Related papers (2024-06-17T17:59:17Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement
Learning with Latent Low-Rank Structure [9.759209713196718]
We consider a class of MDPs for which the associated optimal $Q*$ function is low rank, where the latent features are unknown.
We show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LR-MCPI) and Low Rank Empirical Value Iteration (LR-EVI) achieve the desired sample complexity of $tildeOleft((|S|+|A|)mathrmpoly(d,H)/epsilon2right)$ for a rank
arXiv Detail & Related papers (2022-06-07T20:39:51Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Improved Sample Complexity for Incremental Autonomous Exploration in
MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps.
DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - A Randomized Algorithm to Reduce the Support of Discrete Measures [79.55586575988292]
Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms.
We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling"
We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $Ngg n$ regime.
arXiv Detail & Related papers (2020-06-02T16:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.