GPPF: A General Perception Pre-training Framework via Sparsely Activated
Multi-Task Learning
- URL: http://arxiv.org/abs/2208.02148v2
- Date: Thu, 4 Aug 2022 04:39:23 GMT
- Title: GPPF: A General Perception Pre-training Framework via Sparsely Activated
Multi-Task Learning
- Authors: Benyuan Sun, Jin Dai, Zihao Liang, Congying Liu, Yi Yang, Bo Bai
- Abstract summary: We propose GPPF, a General Perception Pre-training Framework, to pre-train a task-level dynamic network.
By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks.
We develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training.
- Score: 23.15735672234869
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pre-training over mixtured multi-task, multi-domain, and multi-modal data
remains an open challenge in vision perception pre-training. In this paper, we
propose GPPF, a General Perception Pre-training Framework, that pre-trains a
task-level dynamic network, which is composed by knowledge "legos" in each
layers, on labeled multi-task and multi-domain datasets. By inspecting humans'
innate ability to learn in complex environment, we recognize and transfer three
critical elements to deep networks: (1) simultaneous exposure to diverse
cross-task and cross-domain information in each batch. (2) partitioned
knowledge storage in separate lego units driven by knowledge sharing. (3)
sparse activation of a subset of lego units for both pre-training and
downstream tasks. Noteworthy, the joint training of disparate vision tasks is
non-trivial due to their differences in input shapes, loss functions, output
formats, data distributions, etc. Therefore, we innovatively develop a
plug-and-play multi-task training algorithm, which supports Single Iteration
Multiple Tasks (SIMT) concurrently training. SIMT lays the foundation of
pre-training with large-scale multi-task multi-domain datasets and is proved
essential for stable training in our GPPF experiments. Excitingly, the
exhaustive experiments show that, our GPPF-R50 model achieves significant
improvements of 2.5-5.8 over a strong baseline of the 8 pre-training tasks in
GPPF-15M and harvests a range of SOTAs over the 22 downstream tasks with
similar computation budgets. We also validate the generalization ability of
GPPF to SOTA vision transformers with consistent improvements. These solid
experimental results fully prove the effective knowledge learning, storing,
sharing, and transfer provided by our novel GPPF framework.
Related papers
- Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM.
In stage 1, we extract task-specific features and client-specific features from visual information.
In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z) - OmniVec: Learning robust representations with cross modal sharing [28.023214572340336]
We present an approach to learn multiple tasks, in multiple modalities, with a unified architecture.
The proposed network is composed of task specific encoders, a common trunk in the middle, followed by task specific prediction heads.
We train the network on all major modalities, e.g. visual, audio, text and 3D, and report results on $22$ diverse and challenging public benchmarks.
arXiv Detail & Related papers (2023-11-07T14:00:09Z) - Unified Open-Vocabulary Dense Visual Prediction [51.03014432235629]
Open-vocabulary (OV) dense visual prediction has attracted increasing research attention.
Most of existing approaches are task-specific and individually tackle each task.
We propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks.
arXiv Detail & Related papers (2023-07-17T04:39:18Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text
Transfer Transformers [0.0]
We investigated the behavior of multi-domain, multi-task learning using multi-domain text-to-text transfer transformers (MD-T5)
We carried out experiments using three popular training strategies: Bert-style joint pretraining + successive finetuning, GPT-style joint pretraining + successive finetuning, and GPT-style joint pretraining + joint finetuning.
We show that while negative knowledge transfer and catastrophic forgetting are still considerable challenges for all the models, the GPT-style joint pretraining + joint finetuning strategy showed the most promise in multi-domain, multi-task learning.
arXiv Detail & Related papers (2022-09-21T04:21:27Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Omni-Training for Data-Efficient Deep Learning [80.28715182095975]
Recent advances reveal that a properly pre-trained model endows an important property: transferability.
A tight combination of pre-training and meta-training cannot achieve both kinds of transferability.
This motivates the proposed Omni-Training framework towards data-efficient deep learning.
arXiv Detail & Related papers (2021-10-14T16:30:36Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - Understanding and Improving Information Transfer in Multi-Task Learning [14.43111978531182]
We study an architecture with a shared module for all tasks and a separate output module for each task.
We show that misalignment between task data can cause negative transfer (or hurt performance) and provide sufficient conditions for positive transfer.
Inspired by the theoretical insights, we show that aligning tasks' embedding layers leads to performance gains for multi-task training and transfer learning.
arXiv Detail & Related papers (2020-05-02T23:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.