INTERN: A New Learning Paradigm Towards General Vision
- URL: http://arxiv.org/abs/2111.08687v1
- Date: Tue, 16 Nov 2021 18:42:50 GMT
- Title: INTERN: A New Learning Paradigm Towards General Vision
- Authors: Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He,
Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu
Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu
Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan,
Dahua Lin, Xiaogang Wang, Yu Qiao
- Abstract summary: We develop a new learning paradigm named INTERN.
By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability.
In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
- Score: 117.3343347061931
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Enormous waves of technological innovations over the past several years,
marked by the advances in AI technologies, are profoundly reshaping the
industry and the society. However, down the road, a key challenge awaits us,
that is, our capability of meeting rapidly-growing scenario-specific demands is
severely limited by the cost of acquiring a commensurate amount of training
data. This difficult situation is in essence due to limitations of the
mainstream learning paradigm: we need to train a new model for each new
scenario, based on a large quantity of well-annotated data and commonly from
scratch. In tackling this fundamental problem, we move beyond and develop a new
learning paradigm named INTERN. By learning with supervisory signals from
multiple sources in multiple stages, the model being trained will develop
strong generalizability. We evaluate our model on 26 well-known datasets that
cover four categories of tasks in computer vision. In most cases, our models,
adapted with only 10% of the training data in the target domain, outperform the
counterparts trained with the full set of data, often by a significant margin.
This is an important step towards a promising prospect where such a model with
general vision capability can dramatically reduce our reliance on data, thus
expediting the adoption of AI technologies. Furthermore, revolving around our
new paradigm, we also introduce a new data system, a new architecture, and a
new benchmark, which, together, form a general vision ecosystem to support its
future development in an open and inclusive manner.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Big Cooperative Learning [7.958840888809145]
We show that the training of foundation models can be interpreted as a form of big cooperative learning.
We propose the BigLearn-GAN, which is a novel adversarially-trained foundation model with versatile data sampling capabilities.
arXiv Detail & Related papers (2024-07-31T03:59:14Z) - A survey of synthetic data augmentation methods in computer vision [0.0]
This paper presents an extensive review of synthetic data augmentation techniques.
We focus on the important data generation and augmentation techniques, general scope of application and specific use-cases.
We provide a summary of common synthetic datasets for training computer vision models.
arXiv Detail & Related papers (2024-03-15T07:34:08Z) - On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains.
In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Forging Vision Foundation Models for Autonomous Driving: Challenges,
Methodologies, and Opportunities [59.02391344178202]
Vision foundation models (VFMs) serve as potent building blocks for a wide range of AI applications.
The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs.
This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions.
arXiv Detail & Related papers (2024-01-16T01:57:24Z) - A Survey of Serverless Machine Learning Model Inference [0.0]
Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products.
This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
arXiv Detail & Related papers (2023-11-22T18:46:05Z) - A Comprehensive Study on Model Initialization Techniques Ensuring
Efficient Federated Learning [0.0]
Federated learning(FL) has emerged as a promising paradigm for training machine learning models in a distributed and privacy-preserving manner.
The choice of methods used for models plays a crucial role in the performance, convergence speed, communication efficiency, privacy guarantees of federated learning systems.
Our research meticulously compares, categorizes, and delineates the merits and demerits of each technique, examining their applicability across diverse FL scenarios.
arXiv Detail & Related papers (2023-10-31T23:26:58Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - Neurosymbolic AI for Situated Language Understanding [13.249453757295083]
We argue that computational situated grounding provides a solution to some of these learning challenges.
Our model reincorporates some ideas of classic AI into a framework of neurosymbolic intelligence.
We discuss how situated grounding provides diverse data and multiple levels of modeling for a variety of AI learning challenges.
arXiv Detail & Related papers (2020-12-05T05:03:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.