Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
- URL: http://arxiv.org/abs/2405.14629v1
- Date: Thu, 23 May 2024 14:35:56 GMT
- Title: Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
- Authors: Takuya Hiraoka, Guanquan Wang, Takashi Onishi, Yoshimasa Tsuruoka,
- Abstract summary: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance.
One method for estimating the influence of experiences is the leave-one-out (LOO) method.
We present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences.
- Score: 15.81191445609191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence of these experiences is valuable for various purposes, such as identifying experiences that negatively influence poorly performing RL agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. However, this method is usually computationally prohibitive. In this paper, we present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences. We evaluate how accurately PIToD estimates the influence of experiences and its efficiency compared to LOO. We then apply PIToD to amend poorly performing RL agents, i.e., we use PIToD to estimate negatively influential experiences for the RL agents and to delete the influence of these experiences. We show that RL agents' performance is significantly improved via amendments with PIToD.
Related papers
- Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Which Experiences Are Influential for Your Agent? Policy Iteration with
Turn-over Dropout [15.856188608650228]
We present PI+ToD as a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout.
We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
arXiv Detail & Related papers (2023-01-26T15:13:04Z) - Fair Effect Attribution in Parallel Online Experiments [57.13281584606437]
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services.
It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly.
Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative impact on average population outcomes.
arXiv Detail & Related papers (2022-10-15T17:15:51Z) - Experiential Explanations for Reinforcement Learning [15.80179578318569]
Reinforcement Learning systems can be complex and non-interpretable.
We propose a technique, Experiential Explanations, to generate counterfactual explanations.
arXiv Detail & Related papers (2022-10-10T14:27:53Z) - Look Back When Surprised: Stabilizing Reverse Experience Replay for
Neural Approximation [7.6146285961466]
We consider the recently developed and theoretically rigorous reverse experience replay (RER)
We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
arXiv Detail & Related papers (2022-06-07T10:42:02Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - Causal Influence Detection for Improving Efficiency in Reinforcement
Learning [11.371889042789219]
We introduce a measure of situation-dependent causal influence based on conditional mutual information.
We show that it can reliably detect states of influence.
All modified algorithms show strong increases in data efficiency on robotic manipulation tasks.
arXiv Detail & Related papers (2021-06-07T09:21:56Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.