Related papers: A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation

A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation

URL: http://arxiv.org/abs/2211.14296v1
Date: Fri, 25 Nov 2022 18:52:48 GMT
Title: A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation
Authors: Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo, Shixiang Shane Gu
Abstract summary: In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data. We introduce morphology-task graph, which treats observations, actions and goals/task in a unified graph representation. We also develop MxT-Bench for fast large-scale behavior generation, which supports procedural generation of diverse morphology-task combinations.
Score: 28.041319351752485
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of generalist large-scale models in natural language and vision has made us expect that a massive data-driven approach could achieve broader generalization in other domains such as continuous control. In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data. In order to align input-output (IO) interface among multiple tasks and diverse agent morphologies while preserving essential 3D geometric relations, we introduce morphology-task graph, which treats observations, actions and goals/task in a unified graph representation. We also develop MxT-Bench for fast large-scale behavior generation, which supports procedural generation of diverse morphology-task combinations with a minimal blueprint and hardware-accelerated simulator. Through efficient representation and architecture selection on MxT-Bench, we find out that a morphology-task graph representation coupled with Transformer architecture improves the multi-task performances compared to other baselines including recent discrete tokenization, and provides better prior knowledge for zero-shot transfer or sample efficiency in downstream multi-task imitation learning. Our work suggests large diverse offline datasets, unified IO representation, and policy representation and architecture selection through supervised learning form a promising approach for studying and advancing morphology-task generalization.

Related papers

HGMP:Heterogeneous Graph Multi-Task Prompt Learning [18.703129208282913]
We propose a novel multi-task prompt framework for the heterogeneous graph domain, named HGMP.<n>First, to bridge the gap between the pre-trained model and downstream tasks, we reformulate all downstream tasks into a unified graph-level task format.<n>We design a graph-level contrastive pre-training strategy to better leverage heterogeneous information and enhance performance in multi-task scenarios.
arXiv Detail & Related papers (2025-07-10T04:01:47Z)
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation [39.921363034430875]
Unified image understanding and generation has emerged as a promising paradigm in multimodal artificial intelligence.<n>We study the modality alignment behaviors of task-specific expert models for understanding and generation.<n>We introduce UniFork, a novel Y-shaped architecture that shares the shallow layers for cross-task representation learning, while employing task-specific branches in deeper layers to avoid task interference.
arXiv Detail & Related papers (2025-06-20T17:52:31Z)
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning [68.98988753763666]
We propose VisualCloze, a universal image generation framework. VisualCloze supports a wide range of in-domain tasks, generalization to unseen ones, unseen unification of multiple tasks, and reverse generation. We introduce Graph200K, a graph-structured dataset that establishes various interrelated tasks, enhancing task density and transferable knowledge.
arXiv Detail & Related papers (2025-04-10T17:59:42Z)
From Primes to Paths: Enabling Fast Multi-Relational Graph Analysis [5.008498268411793]
Multi-relational networks capture intricate relationships in data and have diverse applications across fields such as biomedical, financial, and social sciences. This work extends the Prime Adjacency Matrices framework, which employs prime numbers to represent distinct relations within a network uniquely.
arXiv Detail & Related papers (2024-11-17T18:43:01Z)
Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval. This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z)
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct [148.39859547619156]
We propose MMEvol, a novel multimodal instruction data evolution framework. MMEvol iteratively improves data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution. Our approach reaches state-of-the-art (SOTA) performance in nine tasks using significantly less data compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-09T17:44:00Z)
A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images. We validated our approach on a large hyperspectral dataset called TAIGA. A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z)
Bond Graphs for multi-physics informed Neural Networks for multi-variate time series [6.775534755081169]
Existing methods are not adapted to tasks with complex multi-physical and multi-domain phenomena. We propose a Neural Bond graph (NBgE) producing multi-physics-informed representations that can be fed into any task-specific model.
arXiv Detail & Related papers (2024-05-22T12:30:25Z)
An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z)
DEPHN: Different Expression Parallel Heterogeneous Network using virtual gradient optimization for Multi-task Learning [1.0705399532413615]
Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors. Traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation. We propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously.
arXiv Detail & Related papers (2023-07-24T04:29:00Z)
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z)
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver. It processes a variety of modalities and tasks with unified modeling and shared parameters. Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z)
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.