Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks
- URL: http://arxiv.org/abs/2112.01522v1
- Date: Thu, 2 Dec 2021 18:59:50 GMT
- Title: Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks
- Authors: Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng
Li, Xiaohua Wang, Jifeng Dai
- Abstract summary: We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
- Score: 73.63892022944198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biological intelligence systems of animals perceive the world by integrating
information in different modalities and processing simultaneously for various
tasks. In contrast, current machine learning research follows a task-specific
paradigm, leading to inefficient collaboration between tasks and high marginal
costs of developing perception models for new tasks. In this paper, we present
a generic perception architecture named Uni-Perceiver, which processes a
variety of modalities and tasks with unified modeling and shared parameters.
Specifically, Uni-Perceiver encodes different task inputs and targets from
arbitrary modalities into a unified representation space with a
modality-agnostic Transformer encoder and lightweight modality-specific
tokenizers. Different perception tasks are modeled as the same formulation,
that is, finding the maximum likelihood target for each input through the
similarity of their representations. The model is pre-trained on several
uni-modal and multi-modal tasks, and evaluated on a variety of downstream
tasks, including novel tasks that did not appear in the pre-training stage.
Results show that our pre-trained model without any tuning can achieve
reasonable performance even on novel tasks. The performance can be improved to
a level close to state-of-the-art methods by conducting prompt tuning on 1% of
downstream task data. Full-data fine-tuning further delivers results on par
with or better than state-of-the-art results. Code shall be released.
Related papers
- Tint Your Models Task-wise for Improved Multi-task Model Merging [17.496018757317824]
We propose Model Tinting, a test-time approach that introduces a single task-specific layer for each task as trainable adjustments.
Our method jointly trains merging coefficients and task-specific layers, which effectively reduces task conflicts with minimal additional costs.
Our method achieves state-of-the-art performance across both computer vision and natural language processing tasks.
arXiv Detail & Related papers (2024-12-26T07:42:06Z) - One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning [16.96824902454355]
We propose a unified framework that concurrently handles multiple tasks and modalities.
In this framework, all modalities and tasks are represented as unified tokens and trained using a single, consistent approach.
We present a new benchmark, MMUD, which includes samples annotated with multiple task labels.
We demonstrate the ability to handle multiple tasks simultaneously in a streamlined and efficient manner.
arXiv Detail & Related papers (2024-08-06T07:19:51Z) - A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Modeling Output-Level Task Relatedness in Multi-Task Learning with Feedback Mechanism [7.479892725446205]
Multi-task learning (MTL) is a paradigm that simultaneously learns multiple tasks by sharing information at different levels.
We introduce a posteriori information into the model, considering that different tasks may produce correlated outputs with mutual influences.
We achieve this by incorporating a feedback mechanism into MTL models, where the output of one task serves as a hidden feature for another task.
arXiv Detail & Related papers (2024-04-01T03:27:34Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts [75.75548749888029]
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks.
With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv Detail & Related papers (2023-05-11T17:57:49Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Unifying Flow, Stereo and Depth Estimation [121.54066319299261]
We present a unified formulation and model for three motion and 3D perception tasks.
We formulate all three tasks as a unified dense correspondence matching problem.
Our model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks.
arXiv Detail & Related papers (2022-11-10T18:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.