Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner
- URL: http://arxiv.org/abs/2409.19949v1
- Date: Mon, 30 Sep 2024 05:05:37 GMT
- Title: Task-agnostic Pre-training and Task-guided Fine-tuning for Versatile Diffusion Planner
- Authors: Chenyou Fan, Chenjia Bai, Zhao Shan, Haoran He, Yang Zhang, Zhen Wang,
- Abstract summary: We propose textbfSODP, a two-stage framework that learns a textbfDiffusion textbfPlanner, which is generalizable for various downstream tasks.
In the pre-training stage, we train a foundation diffusion planner that extracts general planning capabilities by modeling the versatile distribution of multi-task trajectories.
Then for downstream tasks, we adopt RL-based fine-tuning with task-specific rewards to fast refine the diffusion planner.
- Score: 12.360598915420255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via multi-task imitation, or require task-specific reward labels to facilitate policy optimization via Reinforcement Learning (RL). To address these challenges, we aim to develop a versatile diffusion planner that can leverage large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks. In this paper, we propose \textbf{SODP}, a two-stage framework that leverages \textbf{S}ub-\textbf{O}ptimal data to learn a \textbf{D}iffusion \textbf{P}lanner, which is generalizable for various downstream tasks. Specifically, in the pre-training stage, we train a foundation diffusion planner that extracts general planning capabilities by modeling the versatile distribution of multi-task trajectories, which can be sub-optimal and has wide data coverage. Then for downstream tasks, we adopt RL-based fine-tuning with task-specific rewards to fast refine the diffusion planner, which aims to generate action sequences with higher task-specific returns. Experimental results from multi-task domains including Meta-World and Adroit demonstrate that SODP outperforms state-of-the-art methods with only a small amount of data for reward-guided fine-tuning.
Related papers
- Generalization of Compositional Tasks with Logical Specification via Implicit Planning [14.46490764849977]
We introduce a new hierarchical RL framework that enhances the efficiency and optimality of task generalization.
At the high level, we present an implicit planner specifically designed for generalizing compositional tasks.
It learns a latent transition model and performs planning in the latent space by using a graph neural network (GNN)
arXiv Detail & Related papers (2024-10-13T00:57:10Z) - Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning [22.208506419271043]
This work introduces textbfAnyTaskTune, a novel fine-tuning methodology to elevate model performance on a diverse array of domain-specific tasks.
We conducted experiments across over twenty different sub-tasks derived from the domains of finance, healthcare, law, psychology, consumer services, and human resources.
Our findings demonstrate that models fine-tuned using the textbfTask-Fine-Tune methodology not only achieve superior performance on these specific tasks but also significantly outperform models with higher general capabilities in their respective domains.
arXiv Detail & Related papers (2024-07-09T17:59:56Z) - BoRA: Bayesian Hierarchical Low-Rank Adaption for Multi-task Large Language Models [0.0]
This paper introduces Bayesian Hierarchical Low-Rank Adaption (BoRA), a novel method for finetuning multi-task Large Language Models (LLMs)
BoRA addresses trade-offs by leveraging a Bayesian hierarchical model that allows tasks to share information through global hierarchical priors.
Our experimental results show that BoRA outperforms both individual and unified model approaches, achieving lower perplexity and better generalization across tasks.
arXiv Detail & Related papers (2024-07-08T06:38:50Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - ULTRA-DP: Unifying Graph Pre-training with Multi-task Graph Dual Prompt [67.8934749027315]
We propose a unified framework for graph hybrid pre-training which injects the task identification and position identification into GNNs.
We also propose a novel pre-training paradigm based on a group of $k$-nearest neighbors.
arXiv Detail & Related papers (2023-10-23T12:11:13Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - POMRL: No-Regret Learning-to-Plan with Increasing Horizons [43.693739167594295]
We study the problem of planning under model uncertainty in an online meta-reinforcement learning setting.
We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss.
arXiv Detail & Related papers (2022-12-30T03:09:45Z) - Generalization with Lossy Affordances: Leveraging Broad Offline Data for
Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z) - Semi-supervised Multi-task Learning for Semantics and Depth [88.77716991603252]
Multi-Task Learning (MTL) aims to enhance the model generalization by sharing representations between related tasks for better performance.
We propose the Semi-supervised Multi-Task Learning (MTL) method to leverage the available supervisory signals from different datasets.
We present a domain-aware discriminator structure with various alignment formulations to mitigate the domain discrepancy issue among datasets.
arXiv Detail & Related papers (2021-10-14T07:43:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.