Related papers: Data-Incremental Continual Offline Reinforcement Learning

Data-Incremental Continual Offline Reinforcement Learning

URL: http://arxiv.org/abs/2404.12639v3
Date: Mon, 16 Dec 2024 15:00:05 GMT
Title: Data-Incremental Continual Offline Reinforcement Learning
Authors: Sibo Gai, Donglin Wang,
Abstract summary: We propose a new setting of continual learning: data-incremental continual offline reinforcement learning (DICORL)<n>In DICORL, an agent is asked to learn a sequence of datasets of a single offline reinforcement learning (RL) task continually, instead of learning a sequence of offline RL tasks with respective datasets.<n>Our experiments show that EREIQL relieves active forgetting in DICORL and performs well.
Score: 25.110235967357248
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we propose a new setting of continual learning: data-incremental continual offline reinforcement learning (DICORL), in which an agent is asked to learn a sequence of datasets of a single offline reinforcement learning (RL) task continually, instead of learning a sequence of offline RL tasks with respective datasets. Then, we propose that this new setting will introduce a unique challenge to continual learning: active forgetting, which means that the agent will forget the learnt skill actively. The main reason for active forgetting is conservative learning used by offline RL, which is used to solve the overestimation problem. With conservative learning, the offline RL method will suppress the value of all actions, learnt or not, without selection, unless it is in the just learning dataset. Therefore, inferior data may overlay premium data because of the learning sequence. To solve this problem, we propose a new algorithm, called experience-replay-based ensemble implicit Q-learning (EREIQL), which introduces multiple value networks to reduce the initial value and avoid using conservative learning, and the experience replay to relieve catastrophic forgetting. Our experiments show that EREIQL relieves active forgetting in DICORL and performs well.

Related papers

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
Unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies. Our method SUPE consistently outperforms prior strategies across a suite of 42 long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z)
A Unified Framework for Continual Learning and Machine Unlearning [9.538733681436836]
Continual learning and machine unlearning are crucial challenges in machine learning, typically addressed separately. We introduce a novel framework that jointly tackles both tasks by leveraging controlled knowledge distillation. Our approach enables efficient learning with minimal forgetting and effective targeted unlearning.
arXiv Detail & Related papers (2024-08-21T06:49:59Z)
Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z)
Negotiated Representations to Prevent Forgetting in Machine Learning Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning. We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z)
Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner. We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z)
Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [35.29390454207064]
Dexterous manipulation in particular remains an open problem in its general form. We propose a benchmark including a large collection of data for offline learning from a dexterous manipulation platform on two tasks. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
arXiv Detail & Related papers (2023-07-28T17:29:49Z)
OER: Offline Experience Replay for Continual Offline Reinforcement Learning [25.985985377992034]
Continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks. We propose a new model-based experience selection scheme to build the replay buffer, where a transition model is learned to approximate the state distribution.
arXiv Detail & Related papers (2023-05-23T08:16:44Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged. We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Relational Experience Replay: Continual Learning by Adaptively Tuning Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff. ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z)
RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive. They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z)
Offline Reinforcement Learning with Value-based Episodic Memory [19.12430651038357]
offline reinforcement learning (RL) shows promise of applying RL to real-world problems. We propose Expectile V-Learning (EVL), which smoothly interpolates between the optimal value learning and behavior cloning. We present a new offline method called Value-based Episodic Memory (VEM)
arXiv Detail & Related papers (2021-10-19T08:20:11Z)
Online Continual Learning Via Candidates Voting [7.704949298975352]
We introduce an effective and memory-efficient method for online continual learning under class-incremental setting. Our proposed method achieves the best results under different benchmark datasets for online continual learning including CIFAR-10, CIFAR-100 and CORE-50.
arXiv Detail & Related papers (2021-10-17T15:45:32Z)
DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning [6.9884912034790405]
We propose a framework, Data CUrriculum for Reinforcement learning (DCUR), which first trains teachers using online deep RL. Then, students learn by running either offline RL or by using teacher data in combination with a small amount of self-generated data.
arXiv Detail & Related papers (2021-09-15T15:39:46Z)
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy. In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online. We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z)
Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting. We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration. Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z)
Representation Matters: Offline Pretraining for Sequential Decision Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)
Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks. We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.