PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning
Method
- URL: http://arxiv.org/abs/2110.06906v1
- Date: Wed, 13 Oct 2021 17:40:12 GMT
- Title: PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning
Method
- Authors: Ziwei Guan, Tengyu Xu, Yingbin Liang
- Abstract summary: We propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period.
We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity to bes.
- Score: 49.93717224277131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a
successful method to conduct the off-policy value function evaluation with
function approximation. Although ETD has been shown to converge asymptotically
to a desirable value function, it is well-known that ETD often encounters a
large variance so that its sample complexity can increase exponentially fast
with the number of iterations. In this work, we propose a new ETD method,
called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates
the follow-on trace only for a finite period for each iteration of the
evaluation parameter. Further, PER-ETD features a design of the logarithmical
increase of the restart period with the number of iterations, which guarantees
the best trade-off between the variance and bias and keeps both vanishing
sublinearly. We show that PER-ETD converges to the same desirable fixed point
as ETD, but improves the exponential sample complexity of ETD to be
polynomials. Our experiments validate the superior performance of PER-ETD and
its advantage over ETD.
Related papers
- Alpha Entropy Search for New Information-based Bayesian Optimization [9.749560288448116]
We introduce a novel information-based class of acquisition functions called Alpha Entropy Search (AES)
AES is based on the alpha-divergence, that generalizes the Kullback-Leibler (KL) divergence.
arXiv Detail & Related papers (2024-11-25T17:19:30Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition
into Value and Fourier Domains [12.963163500336066]
A limited class of decomposable PDEs have sparse features in the value domain.
We propose Reel, which accelerates the learning of PDEs via random projection.
We provide empirical evidence that our proposed Reel can lead to faster learning of PDE models.
arXiv Detail & Related papers (2023-09-13T22:48:30Z) - Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models
Using Pairwise-Distance Estimators [21.098866735156207]
Pairwise-distance estimators (PaiDEs) establish bounds on entropy.
Unlike sample-based Monte Carlo estimators, PaiDEs exhibit a remarkable capability to estimate epistemic uncertainty at speeds up to 100 times faster.
We compare our approach to existing active learning methods and find that our approach outperforms on high-dimensional regression tasks.
arXiv Detail & Related papers (2023-08-25T17:13:42Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Low-rank Optimal Transport: Approximation, Statistics and Debiasing [51.50788603386766]
Low-rank optimal transport (LOT) approach advocated in citescetbon 2021lowrank
LOT is seen as a legitimate contender to entropic regularization when compared on properties of interest.
We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
arXiv Detail & Related papers (2022-05-24T20:51:37Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Simple and optimal methods for stochastic variational inequalities, II:
Markovian noise and policy evaluation in reinforcement learning [9.359939442911127]
This paper focuses on resetting variational inequalities (VI) under Markovian noise.
A prominent application of our algorithmic developments is the policy evaluation problem in reinforcement learning.
arXiv Detail & Related papers (2020-11-15T04:05:22Z) - Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence
Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting.
Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.