Related papers: An Empirical Study of Deep Reinforcement Learning in Continuing Tasks

An Empirical Study of Deep Reinforcement Learning in Continuing Tasks

URL: http://arxiv.org/abs/2501.06937v1
Date: Sun, 12 Jan 2025 21:24:27 GMT
Title: An Empirical Study of Deep Reinforcement Learning in Continuing Tasks
Authors: Yi Wan, Dmytro Korenkevych, Zheqing Zhu,
Abstract summary: In reinforcement learning (RL), continuing tasks refer to tasks where the agent-environment interaction is ongoing and can not be broken down into episodes.<n>We study several well-known deep RL algorithms using a suite of continuing task testbeds based on Mujoco and Atari environments.<n>We also investigate the effectiveness of a method for improving temporal-difference-based RL algorithms in continuing tasks by centering rewards.
Score: 5.741277679263804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In reinforcement learning (RL), continuing tasks refer to tasks where the agent-environment interaction is ongoing and can not be broken down into episodes. These tasks are suitable when environment resets are unavailable, agent-controlled, or predefined but where all rewards-including those beyond resets-are critical. These scenarios frequently occur in real-world applications and can not be modeled by episodic tasks. While modern deep RL algorithms have been extensively studied and well understood in episodic tasks, their behavior in continuing tasks remains underexplored. To address this gap, we provide an empirical study of several well-known deep RL algorithms using a suite of continuing task testbeds based on Mujoco and Atari environments, highlighting several key insights concerning continuing tasks. Using these testbeds, we also investigate the effectiveness of a method for improving temporal-difference-based RL algorithms in continuing tasks by centering rewards, as introduced by Naik et al. (2024). While their work primarily focused on this method in conjunction with Q-learning, our results extend their findings by demonstrating that this method is effective across a broader range of algorithms, scales to larger tasks, and outperforms two other reward-centering approaches.

Related papers

A Survey of Continual Reinforcement Learning [37.12149196139624]
Reinforcement Learning (RL) is an important machine learning paradigm for solving sequential decision-making problems.<n>RL's limited ability to generalize across tasks restricts its applicability in dynamic and real-world environments.<n>Continual Reinforcement Learning (CRL) has emerged as a promising research direction to address these limitations.
arXiv Detail & Related papers (2025-06-27T03:10:20Z)
Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions. By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z)
Continuous Control with Coarse-to-fine Reinforcement Learning [15.585706638252441]
We present a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner. We introduce a concrete, value-based algorithm within the framework called Coarse-to-fine Q-Network (CQN) CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
arXiv Detail & Related papers (2024-07-10T16:04:08Z)
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? [1.9116784879310031]
In deep Reinforcement Learning (RL), value functions are approximated using deep neural networks and trained via mean squared error regression objectives. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup.
arXiv Detail & Related papers (2024-06-10T14:25:11Z)
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z)
Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks. We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Off-Policy Reinforcement Learning with Delayed Rewards [16.914712720033524]
In many real-world tasks, instant rewards are not readily accessible or defined immediately after the agent performs actions. In this work, we first formally define the environment with delayed rewards and discuss the challenges raised due to the non-Markovian nature of such environments. We introduce a general off-policy RL framework with a new Q-function formulation that can handle the delayed rewards with theoretical convergence guarantees.
arXiv Detail & Related papers (2021-06-22T15:19:48Z)
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives. These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
Towards Continual Reinforcement Learning: A Review and Perspectives [69.48324517535549]
We aim to provide a literature review of different formulations and approaches to continual reinforcement learning (RL) While still in its early days, the study of continual RL has the promise to develop better incremental reinforcement learners. These include applications such as those in the fields of healthcare, education, logistics, and robotics.
arXiv Detail & Related papers (2020-12-25T02:35:27Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
Learning Context-aware Task Reasoning for Efficient Meta-reinforcement Learning [29.125234093368732]
We propose a novel meta-RL strategy to achieve human-level efficiency in learning novel tasks. We decompose the meta-RL problem into three sub-tasks, task-exploration, task-inference and task-fulfillment. Our algorithm effectively performs exploration for task inference, improves sample efficiency during both training and testing, and mitigates the meta-overfitting problem.
arXiv Detail & Related papers (2020-03-03T07:38:53Z)
Robust Reinforcement Learning via Adversarial training with Langevin Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.