Related papers: Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences

Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences

URL: http://arxiv.org/abs/2405.14629v1
Date: Thu, 23 May 2024 14:35:56 GMT
Title: Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
Authors: Takuya Hiraoka, Guanquan Wang, Takashi Onishi, Yoshimasa Tsuruoka,
Abstract summary: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. One method for estimating the influence of experiences is the leave-one-out (LOO) method. We present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences.
Score: 15.81191445609191
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence of these experiences is valuable for various purposes, such as identifying experiences that negatively influence poorly performing RL agents. One method for estimating the influence of experiences is the leave-one-out (LOO) method. However, this method is usually computationally prohibitive. In this paper, we present Policy Iteration with Turn-over Dropout (PIToD), which efficiently estimates the influence of experiences. We evaluate how accurately PIToD estimates the influence of experiences and its efficiency compared to LOO. We then apply PIToD to amend poorly performing RL agents, i.e., we use PIToD to estimate negatively influential experiences for the RL agents and to delete the influence of these experiences. We show that RL agents' performance is significantly improved via amendments with PIToD.

Related papers

Reliability-Adjusted Prioritized Experience Replay [5.342556166066767]
We propose an extension to Prioritized Experience Replay (PER) by introducing a novel measure of temporal difference error reliability.<n>We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER.
arXiv Detail & Related papers (2025-06-23T10:35:36Z)
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior [49.62361184944454]
Memory is a critical component in large language model (LLM)-based agents.<n>We study how memory management choices impact the LLM agents' behavior, especially their long-term performance.
arXiv Detail & Related papers (2025-05-21T22:35:01Z)
Perception-R1: Pioneering Perception Policy with Reinforcement Learning [68.13805658351944]
We propose Perception-R1, a scalable RL framework using GRPO during MLLM post-training. With a standard Qwen2.5-VL-3B-Instruct, Perception-R1 achieves +4.2% on RefCOCO+, +17.9% on PixMo-Count, and notably, 31.9% AP on COCO 2017 val.
arXiv Detail & Related papers (2025-04-10T17:58:27Z)
Iterative Experience Refinement of Software-Developing Agents [81.09737243969758]
Large language models (LLMs) can leverage past experiences to reduce errors and enhance efficiency. This paper introduces the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution.
arXiv Detail & Related papers (2024-05-07T11:33:49Z)
Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks. This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed. We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z)
Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout [15.856188608650228]
We present PI+ToD as a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout. We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
arXiv Detail & Related papers (2023-01-26T15:13:04Z)
Fair Effect Attribution in Parallel Online Experiments [57.13281584606437]
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services. It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly. Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative impact on average population outcomes.
arXiv Detail & Related papers (2022-10-15T17:15:51Z)
Experiential Explanations for Reinforcement Learning [15.80179578318569]
Reinforcement Learning systems can be complex and non-interpretable. We propose a technique, Experiential Explanations, to generate counterfactual explanations.
arXiv Detail & Related papers (2022-10-10T14:27:53Z)
Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation [7.6146285961466]
We consider the recently developed and theoretically rigorous reverse experience replay (RER) We show via experiments that this has a better performance than techniques like prioritized experience replay (PER) on various tasks.
arXiv Detail & Related papers (2022-06-07T10:42:02Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
On the impact of MDP design for Reinforcement Learning agents in Resource Management [0.8223798883838329]
We compare and contrast four different MDP variations, discussing their computational requirements and impacts on agent performance. We conclude by showing that, when using Multi-Layer Perceptrons as approximation function, a compact state representation allows transfer of agents between environments.
arXiv Detail & Related papers (2021-09-07T17:13:11Z)
Causal Influence Detection for Improving Efficiency in Reinforcement Learning [11.371889042789219]
We introduce a measure of situation-dependent causal influence based on conditional mutual information. We show that it can reliably detect states of influence. All modified algorithms show strong increases in data efficiency on robotic manipulation tasks.
arXiv Detail & Related papers (2021-06-07T09:21:56Z)
Revisiting Fundamentals of Experience Replay [91.24213515992595]
We present a systematic and extensive analysis of experience replay in Q-learning methods. We focus on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected.
arXiv Detail & Related papers (2020-07-13T21:22:17Z)
Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions. influence estimates are fairly accurate for shallow networks. Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy. We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.