X-Learner: Learning Cross Sources and Tasks for Universal Visual
Representation
- URL: http://arxiv.org/abs/2203.08764v1
- Date: Wed, 16 Mar 2022 17:23:26 GMT
- Title: X-Learner: Learning Cross Sources and Tasks for Universal Visual
Representation
- Authors: Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei
Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao
- Abstract summary: We propose a representation learning framework called X-Learner.
X-Learner learns the universal feature of multiple vision tasks supervised by various sources.
X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
- Score: 71.51719469058666
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In computer vision, pre-training models based on largescale supervised
learning have been proven effective over the past few years. However, existing
works mostly focus on learning from individual task with single data source
(e.g., ImageNet for classification or COCO for detection). This restricted form
limits their generalizability and usability due to the lack of vast semantic
information from various tasks and data sources. Here, we demonstrate that
jointly learning from heterogeneous tasks and multiple data sources contributes
to universal visual representation, leading to better transferring results of
various downstream tasks. Thus, learning how to bridge the gaps among different
tasks and data sources is the key, but it still remains an open question. In
this work, we propose a representation learning framework called X-Learner,
which learns the universal feature of multiple vision tasks supervised by
various sources, with expansion and squeeze stage: 1) Expansion Stage:
X-Learner learns the task-specific feature to alleviate task interference and
enrich the representation by reconciliation layer. 2) Squeeze Stage: X-Learner
condenses the model to a reasonable size and learns the universal and
generalizable representation for various tasks transferring. Extensive
experiments demonstrate that X-Learner achieves strong performance on different
tasks without extra annotations, modalities and computational costs compared to
existing representation learning methods. Notably, a single X-Learner model
shows remarkable gains of 3.0%, 3.3% and 1.8% over current pretrained models on
12 downstream datasets for classification, object detection and semantic
segmentation.
Related papers
- A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Factors of Influence for Transfer Learning across Diverse Appearance
Domains and Task Types [50.1843146606122]
A simple form of transfer learning is common in current state-of-the-art computer vision models.
Previous systematic studies of transfer learning have been limited and the circumstances in which it is expected to work are not fully understood.
In this paper we carry out an extensive experimental exploration of transfer learning across vastly different image domains.
arXiv Detail & Related papers (2021-03-24T16:24:20Z) - Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot
Learning [21.89909688056478]
We propose a new two-level joint idea to augment the generative network with an inference network during training.
This provides strong cross-modal interaction for effective transfer of knowledge between visual and semantic domains.
We evaluate our approach on four benchmark datasets against several state-of-the-art methods, and show its performance.
arXiv Detail & Related papers (2020-07-15T15:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.