Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
- URL: http://arxiv.org/abs/2405.14629v2
- Date: Fri, 04 Oct 2024 12:47:03 GMT
- Title: Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
- Authors: Takuya Hiraoka, Guanquan Wang, Takashi Onishi, Yoshimasa Tsuruoka,
- Abstract summary: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance.
One method for estimating the influence of experiences is the leave-one-out (LOO) method.
We present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences.
- Score: 15.81191445609191
- License:
- Abstract: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about how these experiences influence the agent's performance is valuable for various purposes, such as identifying experiences that negatively influence underperforming agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. However, this method is usually computationally prohibitive. In this paper, we present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences. We evaluate how accurately PIToD estimates the influence of experiences and its efficiency compared to LOO. We then apply PIToD to amend underperforming RL agents, i.e., we use PIToD to estimate negatively influential experiences for the RL agents and to delete the influence of these experiences. We show that RL agents' performance is significantly improved via amendments with PIToD.
Related papers
- Iterative Experience Refinement of Software-Developing Agents [81.09737243969758]
Large language models (LLMs) can leverage past experiences to reduce errors and enhance efficiency.
This paper introduces the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution.
arXiv Detail & Related papers (2024-05-07T11:33:49Z) - Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - Which Experiences Are Influential for Your Agent? Policy Iteration with
Turn-over Dropout [15.856188608650228]
We present PI+ToD as a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout.
We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
arXiv Detail & Related papers (2023-01-26T15:13:04Z) - Fair Effect Attribution in Parallel Online Experiments [57.13281584606437]
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services.
It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly.
Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative impact on average population outcomes.
arXiv Detail & Related papers (2022-10-15T17:15:51Z) - Experiential Explanations for Reinforcement Learning [15.80179578318569]
Reinforcement Learning systems can be complex and non-interpretable.
We propose a technique, Experiential Explanations, to generate counterfactual explanations.
arXiv Detail & Related papers (2022-10-10T14:27:53Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z) - On the impact of MDP design for Reinforcement Learning agents in
Resource Management [0.8223798883838329]
We compare and contrast four different MDP variations, discussing their computational requirements and impacts on agent performance.
We conclude by showing that, when using Multi-Layer Perceptrons as approximation function, a compact state representation allows transfer of agents between environments.
arXiv Detail & Related papers (2021-09-07T17:13:11Z) - Causal Influence Detection for Improving Efficiency in Reinforcement
Learning [11.371889042789219]
We introduce a measure of situation-dependent causal influence based on conditional mutual information.
We show that it can reliably detect states of influence.
All modified algorithms show strong increases in data efficiency on robotic manipulation tasks.
arXiv Detail & Related papers (2021-06-07T09:21:56Z) - Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods.
We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.