A System for Morphology-Task Generalization via Unified Representation
and Behavior Distillation
- URL: http://arxiv.org/abs/2211.14296v1
- Date: Fri, 25 Nov 2022 18:52:48 GMT
- Title: A System for Morphology-Task Generalization via Unified Representation
and Behavior Distillation
- Authors: Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo, Shixiang Shane Gu
- Abstract summary: In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data.
We introduce morphology-task graph, which treats observations, actions and goals/task in a unified graph representation.
We also develop MxT-Bench for fast large-scale behavior generation, which supports procedural generation of diverse morphology-task combinations.
- Score: 28.041319351752485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of generalist large-scale models in natural language and vision has
made us expect that a massive data-driven approach could achieve broader
generalization in other domains such as continuous control. In this work, we
explore a method for learning a single policy that manipulates various forms of
agents to solve various tasks by distilling a large amount of proficient
behavioral data. In order to align input-output (IO) interface among multiple
tasks and diverse agent morphologies while preserving essential 3D geometric
relations, we introduce morphology-task graph, which treats observations,
actions and goals/task in a unified graph representation. We also develop
MxT-Bench for fast large-scale behavior generation, which supports procedural
generation of diverse morphology-task combinations with a minimal blueprint and
hardware-accelerated simulator. Through efficient representation and
architecture selection on MxT-Bench, we find out that a morphology-task graph
representation coupled with Transformer architecture improves the multi-task
performances compared to other baselines including recent discrete
tokenization, and provides better prior knowledge for zero-shot transfer or
sample efficiency in downstream multi-task imitation learning. Our work
suggests large diverse offline datasets, unified IO representation, and policy
representation and architecture selection through supervised learning form a
promising approach for studying and advancing morphology-task generalization.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework.
MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution.
Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z) - A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Bond Graphs for multi-physics informed Neural Networks for multi-variate time series [6.775534755081169]
Existing methods are not adapted to tasks with complex multi-physical and multi-domain phenomena.
We propose a Neural Bond graph (NBgE) producing multi-physics-informed representations that can be fed into any task-specific model.
arXiv Detail & Related papers (2024-05-22T12:30:25Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - DEPHN: Different Expression Parallel Heterogeneous Network using virtual
gradient optimization for Multi-task Learning [1.0705399532413615]
Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors.
Traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation.
We propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously.
arXiv Detail & Related papers (2023-07-24T04:29:00Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.