Direct Advantage Estimation
- URL: http://arxiv.org/abs/2109.06093v1
- Date: Mon, 13 Sep 2021 16:09:31 GMT
- Title: Direct Advantage Estimation
- Authors: Hsiao-Ru Pan, Nico G\"urtler, Alexander Neitz, Bernhard Sch\"olkopf
- Abstract summary: We show that the expected return may depend on the policy in an undesirable way which could slow down learning.
We propose the Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from data.
If desired, value functions can also be seamlessly integrated into DAE and be updated in a similar way to Temporal Difference Learning.
- Score: 63.52264764099532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Credit assignment is one of the central problems in reinforcement learning.
The predominant approach is to assign credit based on the expected return.
However, we show that the expected return may depend on the policy in an
undesirable way which could slow down learning. Instead, we borrow ideas from
the causality literature and show that the advantage function can be
interpreted as causal effects, which share similar properties with causal
representations. Based on this insight, we propose the Direct Advantage
Estimation (DAE), a novel method that can model the advantage function and
estimate it directly from data without requiring the (action-)value function.
If desired, value functions can also be seamlessly integrated into DAE and be
updated in a similar way to Temporal Difference Learning. The proposed method
is easy to implement and can be readily adopted by modern actor-critic methods.
We test DAE empirically on the Atari domain and show that it can achieve
competitive results with the state-of-the-art method for advantage estimation.
Related papers
- Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - Mirror Descent Actor Critic via Bounded Advantage Learning [0.0]
Mirror Descent Value Iteration (MDVI) uses both Kullback-Leibler divergence and entropy as regularizers in its value and policy updates.
We propose Mirror Descent Actor Critic (MDAC) as an actor-critic style instantiation of MDVI for continuous action domains.
arXiv Detail & Related papers (2025-02-06T08:14:03Z) - Skill or Luck? Return Decomposition via Advantage Functions [15.967056781224102]
Learning from off-policy data is essential for sample-efficient reinforcement learning.
We show that the advantage function can be understood as the causal effect of an action on the return.
This decomposition enables us to naturally extend Direct Advantage Estimation to off-policy settings.
arXiv Detail & Related papers (2024-02-20T10:09:00Z) - Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings [6.082810456767599]
Machine learning methods often assume that input features are available at no cost.
In domains like healthcare, where acquiring features could be expensive harmful, it is necessary to balance a features acquisition against its predictive positivity.
We present a problem of active feature acquisition performance evaluation (AFAPE)
arXiv Detail & Related papers (2023-12-03T23:08:29Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - A Generalized Bootstrap Target for Value-Learning, Efficiently Combining
Value and Feature Predictions [39.17511693008055]
Estimating value functions is a core component of reinforcement learning algorithms.
We focus on bootstrapping targets used when estimating value functions.
We propose a new backup target, the $eta$-return mixture.
arXiv Detail & Related papers (2022-01-05T21:54:55Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.