Simultaneously Updating All Persistence Values in Reinforcement Learning
- URL: http://arxiv.org/abs/2211.11620v1
- Date: Mon, 21 Nov 2022 16:22:57 GMT
- Title: Simultaneously Updating All Persistence Values in Reinforcement Learning
- Authors: Luca Sabbioni, Luca Al Daire, Lorenzo Bisi, Alberto Maria Metelli and
Marcello Restelli
- Abstract summary: In reinforcement learning, the performance of learning agents is sensitive to the choice of time discretization.
In this work, we derive a novel All-Persistence Bellman Operator, which allows an effective use of both the low-persistence experience and the high-persistence experience.
- Score: 40.10326490326968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In reinforcement learning, the performance of learning agents is highly
sensitive to the choice of time discretization. Agents acting at high
frequencies have the best control opportunities, along with some drawbacks,
such as possible inefficient exploration and vanishing of the action
advantages. The repetition of the actions, i.e., action persistence, comes into
help, as it allows the agent to visit wider regions of the state space and
improve the estimation of the action effects. In this work, we derive a novel
All-Persistence Bellman Operator, which allows an effective use of both the
low-persistence experience, by decomposition into sub-transition, and the
high-persistence experience, thanks to the introduction of a suitable bootstrap
procedure. In this way, we employ transitions collected at any time scale to
update simultaneously the action values of the considered persistence set. We
prove the contraction property of the All-Persistence Bellman Operator and,
based on it, we extend classic Q-learning and DQN. After providing a study on
the effects of persistence, we experimentally evaluate our approach in both
tabular contexts and more challenging frameworks, including some Atari games.
Related papers
- State-Novelty Guided Action Persistence in Deep Reinforcement Learning [7.05832012052375]
We propose a novel method to dynamically adjust the action persistence based on the current exploration status of the state space.
Our method can be seamlessly integrated into various basic exploration strategies to incorporate temporal persistence.
arXiv Detail & Related papers (2024-09-09T08:34:22Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Concurrent Credit Assignment for Data-efficient Reinforcement Learning [0.0]
The capability to widely sample the state and action spaces is a key ingredient toward building effective reinforcement learning algorithms.
The occupancy model is the subject of frequent updates as the exploration progresses.
It is shown to provide significant increase in the sampling efficacy, that is reflected in a reduced training time and higher returns.
arXiv Detail & Related papers (2022-05-24T12:11:34Z) - Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards.
Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations.
This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Learning Routines for Effective Off-Policy Reinforcement Learning [0.0]
We propose a novel framework for reinforcement learning that effectively lifts such constraints.
Within our framework, agents learn effective behavior over a routine space.
We show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode.
arXiv Detail & Related papers (2021-06-05T18:41:57Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Control Frequency Adaptation via Action Persistence in Batch
Reinforcement Learning [40.94323379769606]
We introduce the notion of action persistence that consists in the repetition of an action for a fixed number of decision steps.
We present a novel algorithm, Persistent Fitted Q-Iteration (PFQI), that extends FQI, with the goal of learning the optimal value function at a given persistence.
arXiv Detail & Related papers (2020-02-17T08:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.