Related papers: Rethinking the Foundations for Continual Reinforcement Learning

Rethinking the Foundations for Continual Reinforcement Learning

URL: http://arxiv.org/abs/2504.08161v3
Date: Tue, 15 Jul 2025 03:27:33 GMT
Title: Rethinking the Foundations for Continual Reinforcement Learning
Authors: Esraa Elelimy, David Szepesvari, Martha White, Michael Bowling,
Abstract summary: We first examine whether the foundations of traditional reinforcement learning are suitable for the continual reinforcement learning paradigm.<n>We identify four key pillars of the traditional reinforcement learning foundations that are antithetical to the goals of continual learning.
Score: 25.069601930142305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the traditional view of reinforcement learning, the agent's goal is to find an optimal policy that maximizes its expected sum of rewards. Once the agent finds this policy, the learning ends. This view contrasts with \emph{continual reinforcement learning}, where learning does not end, and agents are expected to continually learn and adapt indefinitely. Despite the clear distinction between these two paradigms of learning, much of the progress in continual reinforcement learning has been shaped by foundations rooted in the traditional view of reinforcement learning. In this paper, we first examine whether the foundations of traditional reinforcement learning are suitable for the continual reinforcement learning paradigm. We identify four key pillars of the traditional reinforcement learning foundations that are antithetical to the goals of continual learning: the Markov decision process formalism, the focus on atemporal artifacts, the expected sum of rewards as an evaluation metric, and episodic benchmark environments that embrace the other three foundations. We then propose a new formalism that sheds the first and the third foundations and replaces them with the history process as a mathematical formalism and a new definition of deviation regret, adapted for continual learning, as an evaluation metric. Finally, we discuss possible approaches to shed the other two foundations.

Related papers

The Future of Continual Learning in the Era of Foundation Models: Three Key Directions [3.805777835466912]
We argue that continual learning remains essential for three key reasons.<n>We argue it is continual compositionality that will mark the rebirth of continual learning.<n>The future of AI will not be defined by a single static model but by an ecosystem of continually evolving and interacting models.
arXiv Detail & Related papers (2025-06-03T19:06:41Z)
Language Guided Concept Bottleneck Models for Interpretable Continual Learning [62.09201360376577]
Continual learning aims to enable learning systems to acquire new knowledge constantly without forgetting previously learned information.<n>Most existing CL methods focus primarily on preserving learned knowledge to improve model performance.<n>We introduce a novel framework that integrates language-guided Concept Bottleneck Models to address both challenges.
arXiv Detail & Related papers (2025-03-30T02:41:55Z)
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own [59.11934130045106]
We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation.
arXiv Detail & Related papers (2023-10-04T07:56:42Z)
Adapting Double Q-Learning for Continuous Reinforcement Learning [0.65268245109828]
We present a novel approach to the bias correction, similar in spirit to Double Q-Learning. Our approach shows promising near-SOTA results on a small set of MuJoCo environments.
arXiv Detail & Related papers (2023-09-25T19:09:54Z)
A Definition of Continual Reinforcement Learning [69.56273766737527]
In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. Continual reinforcement learning refers to the setting in which the best agents never stop learning. We formalize the notion of agents that "never stop learning" through a new mathematical language for analyzing and cataloging agents.
arXiv Detail & Related papers (2023-07-20T17:28:01Z)
Causal Reinforcement Learning: A Survey [57.368108154871]
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world. Causality offers a notable advantage as it can formalize knowledge in a systematic manner.
arXiv Detail & Related papers (2023-07-04T03:00:43Z)
Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints. We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z)
Hardness in Markov Decision Processes: Theory and Practice [0.0]
We present a systematic survey of the theory of hardness, which identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis. Third, we present an empirical analysis that provides new insights into computable measures.
arXiv Detail & Related papers (2022-10-24T09:51:31Z)
Susceptibility of Continual Learning Against Adversarial Attacks [1.3749490831384268]
We investigate the susceptibility of continually learned tasks, including current and previously acquired tasks, to adversarial attacks. Such susceptibility or vulnerability of learned tasks to adversarial attacks raises profound concerns regarding data integrity and privacy. We explore the robustness of three regularization-based methods, three replay-based approaches, and one hybrid technique that combines replay and exemplar approaches.
arXiv Detail & Related papers (2022-07-11T23:45:12Z)
On Credit Assignment in Hierarchical Reinforcement Learning [0.0]
Hierarchical Reinforcement Learning (HRL) has held longstanding promise to advance reinforcement learning. We show how e.g., a 1-step hierarchical backup' can be seen as a conventional multistep backup with $n$ skip connections over time. We develop a new hierarchical algorithm Hier$Q_k(lambda)$, for which we demonstrate that hierarchical credit assignment alone can already boost agent performance.
arXiv Detail & Related papers (2022-03-07T11:13:09Z)
Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks. We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z)
Recent advances in deep learning theory [104.01582662336256]
This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
arXiv Detail & Related papers (2020-12-20T14:16:41Z)
Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods. Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z)
Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning. The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior. Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
Tracking the Race Between Deep Reinforcement Learning and Imitation Learning -- Extended Version [0.0]
We consider a benchmark planning problem from the reinforcement learning domain, the Racetrack. We compare the performance of deep supervised learning, in particular imitation learning, to reinforcement learning for the Racetrack model.
arXiv Detail & Related papers (2020-08-03T10:31:44Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.