Related papers: Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces

Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces

URL: http://arxiv.org/abs/2410.15698v1
Date: Mon, 21 Oct 2024 07:13:45 GMT
Title: Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces
Authors: Jifeng Hu, Sili Huang, Li Shen, Zhejian Yang, Shengchao Hu, Shisong Tang, Hechang Chen, Yi Chang, Dacheng Tao, Lichao Sun,
Abstract summary: Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems. We propose Vector-Quantized Continual diffuser, named VQ-CD, to break the barrier of different spaces between various tasks.
Score: 52.649077293256795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems by modeling the joint distributions of trajectories. However, most research only focuses on limited continual task settings where the tasks have the same observation and action space, which deviates from the realistic demands of training agents in various environments. In view of this, we propose Vector-Quantized Continual Diffuser, named VQ-CD, to break the barrier of different spaces between various tasks. Specifically, our method contains two complementary sections, where the quantization spaces alignment provides a unified basis for the selective weights activation. In the quantized spaces alignment, we leverage vector quantization to align the different state and action spaces of various tasks, facilitating continual training in the same space. Then, we propose to leverage a unified diffusion model attached by the inverse dynamic model to master all tasks by selectively activating different weights according to the task-related sparse masks. Finally, we conduct extensive experiments on 15 continual learning (CL) tasks, including conventional CL task settings (identical state and action spaces) and general CL task settings (various state and action spaces). Compared with 16 baselines, our method reaches the SOTA performance.

Related papers

GNSP: Gradient Null Space Projection for Preserving Cross-Modal Alignment in VLMs Continual Learning [27.9960664846484]
Contrastive Language-Image Pretraining has demonstrated remarkable zero-shot generalization by aligning visual and textual modalities in a shared embedding space.<n>When continuously fine-tuned on diverse tasks, CLIP suffers from catastrophic forgetting and degradation of its embedding alignment.<n>We propose Gradient Null Space Projection (GNSP), an efficient continual learning method that projects task-specific gradients onto the null space of previously learned knowledge.
arXiv Detail & Related papers (2025-07-26T07:22:12Z)
Equivariant Goal Conditioned Contrastive Reinforcement Learning [5.019456977535218]
Contrastive Reinforcement Learning (CRL) provides a promising framework for extracting useful structured representations from unlabeled interactions.<n>We propose Equivariant CRL, which further structures the latent space using equivariant constraints.<n>Our approach consistently outperforms strong baselines across a range of simulated tasks in both state-based and image-based settings.
arXiv Detail & Related papers (2025-07-22T01:13:45Z)
CKAA: Cross-subspace Knowledge Alignment and Aggregation for Robust Continual Learning [80.18781219542016]
Continual Learning (CL) empowers AI models to continuously learn from sequential task streams.<n>Recent parameter-efficient fine-tuning (PEFT)-based CL methods have garnered increasing attention due to their superior performance.<n>We propose Cross-subspace Knowledge Alignment and Aggregation (CKAA) to enhance robustness against misleading task-ids.
arXiv Detail & Related papers (2025-07-13T03:11:35Z)
Trajectory-Class-Aware Multi-Agent Reinforcement Learning [10.230156872997874]
We introduce TRajectory-class-Aware Multi-Agent reinforcement learning (TRAMA) In TRAMA, agents recognize a task type by identifying the class of trajectories they are experiencing through partial observations. We introduce a trajectory-class predictor that performs agent-wise predictions on the trajectory class.
arXiv Detail & Related papers (2025-03-03T11:46:44Z)
Continual Learning Should Move Beyond Incremental Classification [51.23416308775444]
Continual learning (CL) is the sub-field of machine learning concerned with accumulating knowledge in dynamic environments. Here, we argue that maintaining such a focus limits both theoretical development and practical applicability of CL methods. We identify three fundamental challenges: (C1) the nature of continuity in learning problems, (C2) the choice of appropriate spaces and metrics for measuring similarity, and (C3) the role of learning objectives beyond classification.
arXiv Detail & Related papers (2025-02-17T15:40:13Z)
Transition Transfer $Q$-Learning for Composite Markov Decision Processes [6.337133205762491]
We introduce a novel composite MDP framework where high-dimensional transition dynamics are modeled as the sum of a low-rank component representing shared structure. This relaxes the common assumption of purely low-rank transition models. We introduce UCB-TQL, designed for transfer RL scenarios where multiple tasks share core linear MDP dynamics but diverge along sparse dimensions.
arXiv Detail & Related papers (2025-02-01T19:22:00Z)
Get Rid of Task Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework [10.33844348594636]
We argue that there is an essential to propose a Continuous Multi-task Spatiotemporal learning framework (CMuST) to empower collective urban intelligence. CMuST reforms the urbantemporal learning from singledomain to cooperatively multi-task learning. We establish a benchmark of three cities for multi-tasktemporal learning, and empirically demonstrate the superiority of CMuST.
arXiv Detail & Related papers (2024-10-14T14:04:36Z)
Elastic Multi-Gradient Descent for Parallel Continual Learning [28.749215705746135]
We study the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios. PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress. We propose a memory editing mechanism guided by the gradient computed using EMGD to balance the training between old and new tasks.
arXiv Detail & Related papers (2024-01-02T06:26:25Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Building a Subspace of Policies for Scalable Continual Learning [21.03369477853538]
We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation)
arXiv Detail & Related papers (2022-11-18T14:59:42Z)
Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation [46.103426976842336]
Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks. In this work, we focus on the idea of framing CRL as Curriculums between a source (auxiliary) and a target task distribution. Inspired by the insights from gradual domain adaptation in semi-supervised learning, we create a natural curriculum by breaking down the potentially large task distributional shift in CRL into smaller shifts.
arXiv Detail & Related papers (2022-10-18T22:33:33Z)
On Steering Multi-Annotations per Sample for Multi-Task Learning [79.98259057711044]
The study of multi-task learning has drawn great attention from the community. Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored. Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tasks, and the modified gradient may be less accurate. In this paper, we introduce Task Allocation(STA), a mechanism that addresses this issue by a task allocation approach, in which each sample is randomly allocated a subset of tasks. For further progress, we propose Interleaved Task Allocation(ISTA) to iteratively allocate all
arXiv Detail & Related papers (2022-03-06T11:57:18Z)
Self-Taught Cross-Domain Few-Shot Learning with Weakly Supervised Object Localization and Task-Decomposition [84.24343796075316]
We propose a task-expansion-decomposition framework for Cross-Domain Few-Shot Learning. The proposed Self-Taught (ST) approach alleviates the problem of non-target guidance by constructing task-oriented metric spaces. We conduct experiments under the cross-domain setting including 8 target domains: CUB, Cars, Places, Plantae, CropDieases, EuroSAT, ISIC, and ChestX.
arXiv Detail & Related papers (2021-09-03T04:23:07Z)
Continual Learning in Low-rank Orthogonal Subspaces [86.36417214618575]
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the learning experience is finished. The prior art in CL uses episodic memory, parameter regularization or network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space. We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference.
arXiv Detail & Related papers (2020-10-22T12:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.