Related papers: Direct Advantage Estimation

Direct Advantage Estimation

URL: http://arxiv.org/abs/2109.06093v1
Date: Mon, 13 Sep 2021 16:09:31 GMT
Title: Direct Advantage Estimation
Authors: Hsiao-Ru Pan, Nico G\"urtler, Alexander Neitz, Bernhard Sch\"olkopf
Abstract summary: We show that the expected return may depend on the policy in an undesirable way which could slow down learning. We propose the Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from data. If desired, value functions can also be seamlessly integrated into DAE and be updated in a similar way to Temporal Difference Learning.
Score: 63.52264764099532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Credit assignment is one of the central problems in reinforcement learning. The predominant approach is to assign credit based on the expected return. However, we show that the expected return may depend on the policy in an undesirable way which could slow down learning. Instead, we borrow ideas from the causality literature and show that the advantage function can be interpreted as causal effects, which share similar properties with causal representations. Based on this insight, we propose the Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from data without requiring the (action-)value function. If desired, value functions can also be seamlessly integrated into DAE and be updated in a similar way to Temporal Difference Learning. The proposed method is easy to implement and can be readily adopted by modern actor-critic methods. We test DAE empirically on the Atari domain and show that it can achieve competitive results with the state-of-the-art method for advantage estimation.

Related papers

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits [58.63897489864948]
Reinforcement learning with outcome-based feedback faces a fundamental challenge.<n>How do we assign credit to the right actions?<n>This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
arXiv Detail & Related papers (2025-05-26T17:44:08Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Learning from negative feedback, or positive feedback or both [21.95277469346728]
We introduce a novel approach that decouples learning from positive and negative feedback. A key contribution is demonstrating stable learning from negative feedback alone.
arXiv Detail & Related papers (2024-10-05T14:04:03Z)
Skill or Luck? Return Decomposition via Advantage Functions [15.967056781224102]
Learning from off-policy data is essential for sample-efficient reinforcement learning. We show that the advantage function can be understood as the causal effect of an action on the return. This decomposition enables us to naturally extend Direct Advantage Estimation to off-policy settings.
arXiv Detail & Related papers (2024-02-20T10:09:00Z)
Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings [6.082810456767599]
Machine learning methods often assume that input features are available at no cost. In domains like healthcare, where acquiring features could be expensive harmful, it is necessary to balance a features acquisition against its predictive positivity. We present a problem of active feature acquisition performance evaluation (AFAPE)
arXiv Detail & Related papers (2023-12-03T23:08:29Z)
Online non-parametric likelihood-ratio estimation by Pearson-divergence functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z)
Explaining Adverse Actions in Credit Decisions Using Shapley Decomposition [8.003221404049905]
This paper focuses on credit decisions based on a predictive model for probability of default and proposes a methodology for adverse action explanation. We consider models with low-order interactions and develop a simple and intuitive approach based on first principles. Unlike other Shapley techniques in the literature for local interpretability of machine learning results, B-Shap is computationally tractable.
arXiv Detail & Related papers (2022-04-26T15:07:15Z)
A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions [39.17511693008055]
Estimating value functions is a core component of reinforcement learning algorithms. We focus on bootstrapping targets used when estimating value functions. We propose a new backup target, the $eta$-return mixture.
arXiv Detail & Related papers (2022-01-05T21:54:55Z)
Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. We develop an approach for representation learning in RL that sits in between these two extremes. This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.