Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving
- URL: http://arxiv.org/abs/2209.08953v1
- Date: Mon, 19 Sep 2022 12:15:31 GMT
- Title: Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving
- Authors: Xiwen Liang, Yangxin Wu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan
Liang
- Abstract summary: In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
- Score: 103.745551954983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Aiming towards a holistic understanding of multiple downstream tasks
simultaneously, there is a need for extracting features with better
transferability. Though many latest self-supervised pre-training methods have
achieved impressive performance on various vision tasks under the prevailing
pretrain-finetune paradigm, their generalization capacity to multi-task
learning scenarios is yet to be explored. In this paper, we extensively
investigate the transfer performance of various types of self-supervised
methods, e.g., MoCo and SimCLR, on three downstream tasks, including semantic
segmentation, drivable area segmentation, and traffic object detection, on the
large-scale driving dataset BDD100K. We surprisingly find that their
performances are sub-optimal or even lag far behind the single-task baseline,
which may be due to the distinctions of training objectives and architectural
design lied in the pretrain-finetune paradigm. To overcome this dilemma as well
as avoid redesigning the resource-intensive pre-training stage, we propose a
simple yet effective pretrain-adapt-finetune paradigm for general multi-task
training, where the off-the-shelf pretrained models can be effectively adapted
without increasing the training overhead. During the adapt stage, we utilize
learnable multi-scale adapters to dynamically adjust the pretrained model
weights supervised by multi-task objectives while leaving the pretrained
knowledge untouched. Furthermore, we regard the vision-language pre-training
model CLIP as a strong complement to the pretrain-adapt-finetune paradigm and
propose a novel adapter named LV-Adapter, which incorporates language priors in
the multi-task model via task-specific prompting and alignment between visual
and textual features.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment [0.0]
"Harmonized Transfer Learning and Modality alignment (HarMA)" is a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment.
HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing.
arXiv Detail & Related papers (2024-04-28T17:20:08Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - SMART: Self-supervised Multi-task pretrAining with contRol Transformers [34.604339091596884]
Self-supervised pretraining has been extensively studied in language and vision domains.
It is difficult to properly design such a pretraining approach for sequential decision-making tasks.
We propose a generic pretraining framework for sequential decision making.
arXiv Detail & Related papers (2023-01-24T05:01:23Z) - Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training)
Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z) - Adaptive Transfer Learning on Graph Neural Networks [4.233435459239147]
Graph neural networks (GNNs) are widely used to learn a powerful representation of graph-structured data.
Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation.
We propose a new transfer learning paradigm on GNNs which could effectively leverage self-supervised tasks as auxiliary tasks to help the target task.
arXiv Detail & Related papers (2021-07-19T11:46:28Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.