An Empirical Comparison of Off-policy Prediction Learning Algorithms in
the Four Rooms Environment
- URL: http://arxiv.org/abs/2109.05110v1
- Date: Fri, 10 Sep 2021 21:15:41 GMT
- Title: An Empirical Comparison of Off-policy Prediction Learning Algorithms in
the Four Rooms Environment
- Authors: Sina Ghiassian and Richard S. Sutton
- Abstract summary: We empirically compare 11 off-policy prediction learning algorithms with linear function approximation on two small tasks.
The algorithms' performance is highly affected by the variance induced by the importance sampling ratios.
Emphatic TD($lambda$) tends to have lower error than other algorithms, but might learn more slowly in some cases.
- Score: 9.207173776826403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many off-policy prediction learning algorithms have been proposed in the past
decade, but it remains unclear which algorithms learn faster than others. We
empirically compare 11 off-policy prediction learning algorithms with linear
function approximation on two small tasks: the Rooms task, and the High
Variance Rooms task. The tasks are designed such that learning fast in them is
challenging. In the Rooms task, the product of importance sampling ratios can
be as large as $2^{14}$ and can sometimes be two. To control the high variance
caused by the product of the importance sampling ratios, step size should be
set small, which in turn slows down learning. The High Variance Rooms task is
more extreme in that the product of the ratios can become as large as
$2^{14}\times 25$. This paper builds upon the empirical study of off-policy
prediction learning algorithms by Ghiassian and Sutton (2021). We consider the
same set of algorithms as theirs and employ the same experimental methodology.
The algorithms considered are: Off-policy TD($\lambda$), five Gradient-TD
algorithms, two Emphatic-TD algorithms, Tree Backup($\lambda$),
Vtrace($\lambda$), and ABTD($\zeta$). We found that the algorithms' performance
is highly affected by the variance induced by the importance sampling ratios.
The data shows that Tree Backup($\lambda$), Vtrace($\lambda$), and
ABTD($\zeta$) are not affected by the high variance as much as other algorithms
but they restrict the effective bootstrapping parameter in a way that is too
limiting for tasks where high variance is not present. We observed that
Emphatic TD($\lambda$) tends to have lower asymptotic error than other
algorithms, but might learn more slowly in some cases. We suggest algorithms
for practitioners based on their problem of interest, and suggest approaches
that can be applied to specific algorithms that might result in substantially
improved algorithms.
Related papers
- The Limits of Assumption-free Tests for Algorithm Performance [6.7171902258864655]
How well does an algorithm perform at a given modeling task, and which algorithm performs best?
We make a distinction between two questions: how good is an algorithm $A$ at the problem of learning from a training set of size $n$, versus, how good is a particular fitted model produced by running $A$ on a particular training data set of size $n$?
arXiv Detail & Related papers (2024-02-12T03:19:30Z) - Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Momentum-Based Policy Gradient with Second-Order Information [40.51117836892182]
We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into gradient descent.
Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process.
Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice.
arXiv Detail & Related papers (2022-05-17T11:56:50Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - An Empirical Comparison of Off-policy Prediction Learning Algorithms on
the Collision Task [9.207173776826403]
Off-policy prediction -- learning the value function for one policy from data generated while following another policy -- is one of the most challenging subproblems in reinforcement learning.
This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation.
arXiv Detail & Related papers (2021-06-02T03:45:43Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Learning Sparse Classifiers: Continuous and Mixed Integer Optimization
Perspectives [10.291482850329892]
Mixed integer programming (MIP) can be used to solve (to optimality) $ell_0$-regularized regression problems.
We propose two classes of scalable algorithms: an exact algorithm that can handlepapprox 50,000$ features in a few minutes, and approximate algorithms that can address instances with $papprox6$.
In addition, we present new estimation error bounds for $ell$-regularizeds.
arXiv Detail & Related papers (2020-01-17T18:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.