Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
- URL: http://arxiv.org/abs/2410.15698v1
- Date: Mon, 21 Oct 2024 07:13:45 GMT
- Title: Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
- Authors: Jifeng Hu, Sili Huang, Li Shen, Zhejian Yang, Shengchao Hu, Shisong Tang, Hechang Chen, Yi Chang, Dacheng Tao, Lichao Sun,
- Abstract summary: Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems.
We propose Vector-Quantized Continual diffuser, named VQ-CD, to break the barrier of different spaces between various tasks.
- Score: 52.649077293256795
- License:
- Abstract: Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems by modeling the joint distributions of trajectories. However, most research only focuses on limited continual task settings where the tasks have the same observation and action space, which deviates from the realistic demands of training agents in various environments. In view of this, we propose Vector-Quantized Continual Diffuser, named VQ-CD, to break the barrier of different spaces between various tasks. Specifically, our method contains two complementary sections, where the quantization spaces alignment provides a unified basis for the selective weights activation. In the quantized spaces alignment, we leverage vector quantization to align the different state and action spaces of various tasks, facilitating continual training in the same space. Then, we propose to leverage a unified diffusion model attached by the inverse dynamic model to master all tasks by selectively activating different weights according to the task-related sparse masks. Finally, we conduct extensive experiments on 15 continual learning (CL) tasks, including conventional CL task settings (identical state and action spaces) and general CL task settings (various state and action spaces). Compared with 16 baselines, our method reaches the SOTA performance.
Related papers
- Continual Learning Should Move Beyond Incremental Classification [51.23416308775444]
Continual learning (CL) is the sub-field of machine learning concerned with accumulating knowledge in dynamic environments.
Here, we argue that maintaining such a focus limits both theoretical development and practical applicability of CL methods.
We identify three fundamental challenges: (C1) the nature of continuity in learning problems, (C2) the choice of appropriate spaces and metrics for measuring similarity, and (C3) the role of learning objectives beyond classification.
arXiv Detail & Related papers (2025-02-17T15:40:13Z) - Transition Transfer $Q$-Learning for Composite Markov Decision Processes [6.337133205762491]
We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure.
This relaxes the common assumption of purely low-rank transition models.
We introduce UCB-TQL, designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions.
arXiv Detail & Related papers (2025-02-01T19:22:00Z) - Elastic Multi-Gradient Descent for Parallel Continual Learning [28.749215705746135]
We study the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios.
PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress.
We propose a memory editing mechanism guided by the gradient computed using EMGD to balance the training between old and new tasks.
arXiv Detail & Related papers (2024-01-02T06:26:25Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Building a Subspace of Policies for Scalable Continual Learning [21.03369477853538]
We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks.
CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation)
arXiv Detail & Related papers (2022-11-18T14:59:42Z) - Curriculum Reinforcement Learning using Optimal Transport via Gradual
Domain Adaptation [46.103426976842336]
Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks.
In this work, we focus on the idea of framing CRL as Curriculums between a source (auxiliary) and a target task distribution.
Inspired by the insights from gradual domain adaptation in semi-supervised learning, we create a natural curriculum by breaking down the potentially large task distributional shift in CRL into smaller shifts.
arXiv Detail & Related papers (2022-10-18T22:33:33Z) - Self-Taught Cross-Domain Few-Shot Learning with Weakly Supervised Object
Localization and Task-Decomposition [84.24343796075316]
We propose a task-expansion-decomposition framework for Cross-Domain Few-Shot Learning.
The proposed Self-Taught (ST) approach alleviates the problem of non-target guidance by constructing task-oriented metric spaces.
We conduct experiments under the cross-domain setting including 8 target domains: CUB, Cars, Places, Plantae, CropDieases, EuroSAT, ISIC, and ChestX.
arXiv Detail & Related papers (2021-09-03T04:23:07Z) - Continual Learning in Low-rank Orthogonal Subspaces [86.36417214618575]
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the learning experience is finished.
The prior art in CL uses episodic memory, parameter regularization or network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space.
We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference.
arXiv Detail & Related papers (2020-10-22T12:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.