Related papers: A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration

A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration

URL: http://arxiv.org/abs/2209.04100v1
Date: Fri, 9 Sep 2022 03:02:49 GMT
Title: A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration
Authors: Xianqi Zhang, Xingtao Wang, Xu Liu, Xiaopeng Fan and Debin Zhao
Abstract summary: In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration. Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data. We propose a Memory-related Multi-task Method (M3) to address this problem.
Score: 26.17597857264231
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We pose a new question: Can agents learn how to combine actions from previous tasks to complete new tasks, just as humans? In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration. Compared with offline reinforcement learning, the problem of data distribution shift is more serious. Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, in other words, the test task and the solving strategy do not exist directly in the training data. This makes the problem more difficult. We propose a Memory-related Multi-task Method (M3) to address this problem. The method consists of three stages. First, task-agnostic exploration is carried out to collect data. Different from previous methods, we organize the exploration data into a knowledge graph. We design a model based on the exploration data to extract action effect features and save them in memory, while an action predictive model is trained. Secondly, for a new task, the action effect features stored in memory are used to generate candidate actions by a feature decomposition-based approach. Finally, a multi-scale candidate action pool and the action predictive model are fused to generate a strategy to complete the task. Experimental results show that the performance of our proposed method is significantly improved compared with the baseline.

Related papers

Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations [24.041217922654738]
Continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. Online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations.
arXiv Detail & Related papers (2024-12-02T04:37:12Z)
Reducing catastrophic forgetting of incremental learning in the absence of rehearsal memory with task-specific token [0.6144680854063939]
Deep learning models display catastrophic forgetting when learning new data continuously. We present a novel method that preserves previous knowledge without storing previous data. This method is inspired by the architecture of a vision transformer and employs a unique token capable of encapsulating the compressed knowledge of each task.
arXiv Detail & Related papers (2024-11-06T16:13:50Z)
Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training. In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk. In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z)
Divide and Conquer: Hybrid Pre-training for Person Search [40.13016375392472]
We propose a hybrid pre-training framework specifically designed for person search using sub-task data only. Our model can achieve significant improvements across diverse protocols, such as person search method, fine-tuning data, pre-training data and model backbone. Our code and pre-trained models are released for plug-and-play usage to the person search community.
arXiv Detail & Related papers (2023-12-13T08:33:50Z)
Subspace Adaptation Prior for Few-Shot Learning [5.2997197698288945]
Subspace Adaptation Prior is a novel gradient-based meta-learning algorithm. We show that SAP yields superior or competitive performance in few-shot image classification settings.
arXiv Detail & Related papers (2023-10-13T11:40:18Z)
Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets [73.2096288987301]
We propose a simple approach that uses a small amount of downstream expert data to selectively query relevant behaviors from an offline, unlabeled dataset. We observe that our method learns to query only the relevant transitions to the task, filtering out sub-optimal or task-irrelevant data. Our simple querying approach outperforms more complex goal-conditioned methods by 20% across simulated and real robotic manipulation tasks from images.
arXiv Detail & Related papers (2023-04-18T05:42:53Z)
Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning. The algorithm selects the order of tasks that provide the fastest learning for agents. We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z)
Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Instance-Level Task Parameters: A Robust Multi-task Weighting Framework [17.639472693362926]
Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks. We let the training process dictate the optimal weighting of tasks for every instance in the dataset. We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-11T02:35:42Z)
TAG: Task-based Accumulated Gradients for Lifelong learning [21.779858050277475]
We propose a task-aware system that adapts the learning rate based on the relatedness among tasks. We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer.
arXiv Detail & Related papers (2021-05-11T16:10:32Z)
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials. We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z)
COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning [78.13740204156858]
We show that we can reuse prior data to extend new skills simply through dynamic programming. We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands.
arXiv Detail & Related papers (2020-10-27T17:57:29Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task. We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.