Related papers: UGAE: A Novel Approach to Non-exponential Discounting

UGAE: A Novel Approach to Non-exponential Discounting

URL: http://arxiv.org/abs/2302.05740v1
Date: Sat, 11 Feb 2023 16:41:05 GMT
Title: UGAE: A Novel Approach to Non-exponential Discounting
Authors: Ariel Kwiatkowski, Vicky Kalogeiton, Julien Pettr\'e, Marie-Paule Cani
Abstract summary: Non-exponential discounting methods that align with human behavior are often desirable for creating human-like agents. We propose Universal Generalized Advantage Estimation (UGAE) which allows for the computation of GAE advantage values with arbitrary discounting. We show experimentally that agents with non-exponential discounting trained via UGAE outperform variants trained with Monte Carlo advantage estimation.
Score: 9.358303424584902
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The discounting mechanism in Reinforcement Learning determines the relative importance of future and present rewards. While exponential discounting is widely used in practice, non-exponential discounting methods that align with human behavior are often desirable for creating human-like agents. However, non-exponential discounting methods cannot be directly applied in modern on-policy actor-critic algorithms. To address this issue, we propose Universal Generalized Advantage Estimation (UGAE), which allows for the computation of GAE advantage values with arbitrary discounting. Additionally, we introduce Beta-weighted discounting, a continuous interpolation between exponential and hyperbolic discounting, to increase flexibility in choosing a discounting method. To showcase the utility of UGAE, we provide an analysis of the properties of various discounting methods. We also show experimentally that agents with non-exponential discounting trained via UGAE outperform variants trained with Monte Carlo advantage estimation. Through analysis of various discounting methods and experiments, we demonstrate the superior performance of UGAE with Beta-weighted discounting over the Monte Carlo baseline on standard RL benchmarks. UGAE is simple and easily integrated into any advantage-based algorithm as a replacement for the standard recursive GAE.

Related papers

Generalized Advantage Estimation for Distributional Policy Gradients [3.878500880725885]
Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL)<n>We propose a novel approach that utilizes the optimal transport theory to introduce a Wasserstein-like directional metric, which measures both the distance and the directional discrepancies between probability distributions.<n>Using the exponentially weighted estimation, we leverage this Wasserstein-like directional metric to derive distributional GAE (DGAE)
arXiv Detail & Related papers (2025-07-23T14:07:56Z)
Koopman-Equivariant Gaussian Processes [39.34668284375732]
We propose a family of Gaussian processes (GP) for dynamical systems with linear time-invariant responses. This linearity allows us to tractably quantify forecasting and representational uncertainty. Experiments demonstrate on-par and often better forecasting performance compared to kernel-based methods for learning dynamical systems.
arXiv Detail & Related papers (2025-02-10T16:35:08Z)
EVAL: EigenVector-based Average-reward Learning [4.8748194765816955]
We develop approaches based on function approximation by neural networks. We show how our algorithm can also solve the average-reward RL problem without entropy-regularization.
arXiv Detail & Related papers (2025-01-15T19:00:45Z)
Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models Using Pairwise-Distance Estimators [21.098866735156207]
Pairwise-distance estimators (PaiDEs) establish bounds on entropy. Unlike sample-based Monte Carlo estimators, PaiDEs exhibit a remarkable capability to estimate epistemic uncertainty at speeds up to 100 times faster. We compare our approach to existing active learning methods and find that our approach outperforms on high-dimensional regression tasks.
arXiv Detail & Related papers (2023-08-25T17:13:42Z)
Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z)
The FAIRy Tale of Genetic Algorithms [1.0957528713294875]
We have extended Findable, Accessible, Interoperable and Reusable (FAIR) data principles to enable Genetic and reusability of algorithms. We have presented an overview of methodological developments and variants of GA that makes it challenging to reproduce or even find the right source. This work can be extended to numerous machine learning algorithms/methods.
arXiv Detail & Related papers (2023-04-29T11:36:09Z)
LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning [30.4251858001151]
We show that a squared norm regularization on the implicit reward function is effective, but do not provide a theoretical analysis of the resulting properties of the algorithms. We show that our method, Least Squares Inverse Q-Learning, outperforms state-of-the-art algorithms, particularly in environments with absorbing states.
arXiv Detail & Related papers (2023-03-01T15:46:12Z)
Toward Learning Robust and Invariant Representations with Alignment Regularization and Data Augmentation [76.85274970052762]
This paper is motivated by a proliferation of options of alignment regularizations. We evaluate the performances of several popular design choices along the dimensions of robustness and invariance. We also formally analyze the behavior of alignment regularization to complement our empirical study under assumptions we consider realistic.
arXiv Detail & Related papers (2022-06-04T04:29:19Z)
High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning [119.91679702854499]
We introduce a method based on deep metric learning to perform Bayesian optimisation over high-dimensional, structured input spaces. We achieve such an inductive bias using just 1% of the available labelled data. As an empirical contribution, we present state-of-the-art results on real-world high-dimensional black-box optimisation problems.
arXiv Detail & Related papers (2021-06-07T13:35:47Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Discount Factor as a Regularizer in Reinforcement Learning [23.56942940879309]
It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. We show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss. Motivated by the equivalence, we empirically study this technique compared to standard $L$ regularization.
arXiv Detail & Related papers (2020-07-04T08:10:09Z)
Sparse Gaussian Processes Revisited: Bayesian Approaches to Inducing-Variable Approximations [27.43948386608]
Variational inference techniques based on inducing variables provide an elegant framework for scalable estimation in Gaussian process (GP) models. In this work we challenge the common wisdom that optimizing the inducing inputs in variational framework yields optimal performance.
arXiv Detail & Related papers (2020-03-06T08:53:18Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.