Related papers: Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

URL: http://arxiv.org/abs/2310.02360v2
Date: Thu, 2 May 2024 10:01:56 GMT
Title: Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning
Authors: Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes Andreas Stork,
Abstract summary: We propose prioritized soft Q-decomposition (PSQD) for learning and adapting subtask solutions under lexicographic priorities. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks.
Score: 1.8399318639816038
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition.

Related papers

Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model [6.42114585934114]
Large Language Models (LLMs) possess capabilities that can process diverse language-related tasks. Continual Learning in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks. This paper proposes Analytic Subspace Routing(ASR) to address these challenges.
arXiv Detail & Related papers (2025-03-17T13:40:46Z)
CODE-CL: COnceptor-Based Gradient Projection for DEep Continual Learning [7.573297026523597]
We introduce COnceptor-based gradient projection for DEep Continual Learning (CODE-CL) CODE-CL encodes directional importance within the input space of past tasks, allowing new knowledge integration in directions modulated by $1-S$. We analyze task overlap using conceptor-based representations to identify highly correlated tasks.
arXiv Detail & Related papers (2024-11-21T22:31:06Z)
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance. LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z)
Subspace Adaptation Prior for Few-Shot Learning [5.2997197698288945]
Subspace Adaptation Prior is a novel gradient-based meta-learning algorithm. We show that SAP yields superior or competitive performance in few-shot image classification settings.
arXiv Detail & Related papers (2023-10-13T11:40:18Z)
Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information. We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z)
Robust Subtask Learning for Compositional Generalization [20.54144051436337]
We focus on the problem of training subtask policies in a way that they can be used to perform any task. We aim to maximize the worst-case performance over all tasks as opposed to the average-case performance.
arXiv Detail & Related papers (2023-02-06T18:19:25Z)
Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z)
Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph. Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks. Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z)
Generalizing to New Tasks via One-Shot Compositional Subgoals [23.15624959305799]
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. We introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
arXiv Detail & Related papers (2022-05-16T14:30:11Z)
Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
Hierarchical Reinforcement Learning as a Model of Human Task Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning. The model reproduces known empirical effects of task interleaving. The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.