UGAE: A Novel Approach to Non-exponential Discounting
- URL: http://arxiv.org/abs/2302.05740v1
- Date: Sat, 11 Feb 2023 16:41:05 GMT
- Title: UGAE: A Novel Approach to Non-exponential Discounting
- Authors: Ariel Kwiatkowski, Vicky Kalogeiton, Julien Pettr\'e, Marie-Paule Cani
- Abstract summary: Non-exponential discounting methods that align with human behavior are often desirable for creating human-like agents.
We propose Universal Generalized Advantage Estimation (UGAE) which allows for the computation of GAE advantage values with arbitrary discounting.
We show experimentally that agents with non-exponential discounting trained via UGAE outperform variants trained with Monte Carlo advantage estimation.
- Score: 9.358303424584902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The discounting mechanism in Reinforcement Learning determines the relative
importance of future and present rewards. While exponential discounting is
widely used in practice, non-exponential discounting methods that align with
human behavior are often desirable for creating human-like agents. However,
non-exponential discounting methods cannot be directly applied in modern
on-policy actor-critic algorithms. To address this issue, we propose Universal
Generalized Advantage Estimation (UGAE), which allows for the computation of
GAE advantage values with arbitrary discounting. Additionally, we introduce
Beta-weighted discounting, a continuous interpolation between exponential and
hyperbolic discounting, to increase flexibility in choosing a discounting
method. To showcase the utility of UGAE, we provide an analysis of the
properties of various discounting methods. We also show experimentally that
agents with non-exponential discounting trained via UGAE outperform variants
trained with Monte Carlo advantage estimation. Through analysis of various
discounting methods and experiments, we demonstrate the superior performance of
UGAE with Beta-weighted discounting over the Monte Carlo baseline on standard
RL benchmarks. UGAE is simple and easily integrated into any advantage-based
algorithm as a replacement for the standard recursive GAE.
Related papers
- Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models
Using Pairwise-Distance Estimators [21.098866735156207]
Pairwise-distance estimators (PaiDEs) establish bounds on entropy.
Unlike sample-based Monte Carlo estimators, PaiDEs exhibit a remarkable capability to estimate epistemic uncertainty at speeds up to 100 times faster.
We compare our approach to existing active learning methods and find that our approach outperforms on high-dimensional regression tasks.
arXiv Detail & Related papers (2023-08-25T17:13:42Z) - Mimicking Better by Matching the Approximate Action Distribution [48.81067017094468]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z) - The FAIRy Tale of Genetic Algorithms [1.0957528713294875]
We have extended Findable, Accessible, Interoperable and Reusable (FAIR) data principles to enable Genetic and reusability of algorithms.
We have presented an overview of methodological developments and variants of GA that makes it challenging to reproduce or even find the right source.
This work can be extended to numerous machine learning algorithms/methods.
arXiv Detail & Related papers (2023-04-29T11:36:09Z) - LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning [30.4251858001151]
We show that a squared norm regularization on the implicit reward function is effective, but do not provide a theoretical analysis of the resulting properties of the algorithms.
We show that our method, Least Squares Inverse Q-Learning, outperforms state-of-the-art algorithms, particularly in environments with absorbing states.
arXiv Detail & Related papers (2023-03-01T15:46:12Z) - Toward Learning Robust and Invariant Representations with Alignment
Regularization and Data Augmentation [76.85274970052762]
This paper is motivated by a proliferation of options of alignment regularizations.
We evaluate the performances of several popular design choices along the dimensions of robustness and invariance.
We also formally analyze the behavior of alignment regularization to complement our empirical study under assumptions we consider realistic.
arXiv Detail & Related papers (2022-06-04T04:29:19Z) - High-Dimensional Bayesian Optimisation with Variational Autoencoders and
Deep Metric Learning [119.91679702854499]
We introduce a method based on deep metric learning to perform Bayesian optimisation over high-dimensional, structured input spaces.
We achieve such an inductive bias using just 1% of the available labelled data.
As an empirical contribution, we present state-of-the-art results on real-world high-dimensional black-box optimisation problems.
arXiv Detail & Related papers (2021-06-07T13:35:47Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Discount Factor as a Regularizer in Reinforcement Learning [23.56942940879309]
It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime.
We show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss.
Motivated by the equivalence, we empirically study this technique compared to standard $L$ regularization.
arXiv Detail & Related papers (2020-07-04T08:10:09Z) - Sparse Gaussian Processes Revisited: Bayesian Approaches to
Inducing-Variable Approximations [27.43948386608]
Variational inference techniques based on inducing variables provide an elegant framework for scalable estimation in Gaussian process (GP) models.
In this work we challenge the common wisdom that optimizing the inducing inputs in variational framework yields optimal performance.
arXiv Detail & Related papers (2020-03-06T08:53:18Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.