Related papers: Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout

Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout

URL: http://arxiv.org/abs/2301.11168v2
Date: Mon, 22 May 2023 12:39:55 GMT
Title: Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout
Authors: Takuya Hiraoka, Takashi Onishi, Yoshimasa Tsuruoka
Abstract summary: We present PI+ToD as a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout. We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
Score: 15.856188608650228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence is valuable for various purposes, including experience cleansing and analysis. One method for estimating the influence of individual experiences is agent comparison, but it is prohibitively expensive when there is a large number of experiences. In this paper, we present PI+ToD as a method for efficiently estimating the influence of experiences. PI+ToD is a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout. We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.

Related papers

Reward Prediction Error Prioritisation in Experience Replay: The RPE-PER Method [1.600323605807673]
We introduce Reward Predictive Error Prioritised Experience Replay (RPE-PER) RPE-PER prioritises experiences in the buffer based on RPEs. Our method employs a critic network, EMCN, that predicts rewards in addition to the Q-values produced by standard critic networks.
arXiv Detail & Related papers (2025-01-30T02:09:35Z)
Effect of Requirements Analyst Experience on Elicitation Effectiveness: A Family of Empirical Studies [40.186975773919706]
The purpose of this study was to determine whether experience influences requirements analyst performance. In unfamiliar domains, interview, requirements, development, and professional experience does not influence analyst effectiveness. Interview experience has a strong positive effect, whereas professional experience has a moderate negative effect.
arXiv Detail & Related papers (2024-08-22T16:48:04Z)
Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences [15.81191445609191]
In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. One method for estimating the influence of experiences is the leave-one-out (LOO) method. We present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences.
arXiv Detail & Related papers (2024-05-23T14:35:56Z)
Fair Effect Attribution in Parallel Online Experiments [57.13281584606437]
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services. It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly. Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative impact on average population outcomes.
arXiv Detail & Related papers (2022-10-15T17:15:51Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
Causal Influence Detection for Improving Efficiency in Reinforcement Learning [11.371889042789219]
We introduce a measure of situation-dependent causal influence based on conditional mutual information. We show that it can reliably detect states of influence. All modified algorithms show strong increases in data efficiency on robotic manipulation tasks.
arXiv Detail & Related papers (2021-06-07T09:21:56Z)
Revisiting Prioritized Experience Replay: A Value Perspective [21.958500332929898]
We argue that experience replay enables off-policy reinforcement learning agents to utilize past experiences to maximize the cumulative reward. Our framework links two important quantities in RL: $|textTD|$ and value of experience. We empirically show that the bounds hold in practice, and experience replay using the upper bound as priority improves maximum-entropy RL in Atari games.
arXiv Detail & Related papers (2021-02-05T16:09:07Z)
Learning to Sample with Local and Global Contexts in Experience Replay Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition. We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods. We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.