Related papers: INTERN: A New Learning Paradigm Towards General Vision

INTERN: A New Learning Paradigm Towards General Vision

URL: http://arxiv.org/abs/2111.08687v1
Date: Tue, 16 Nov 2021 18:42:50 GMT
Title: INTERN: A New Learning Paradigm Towards General Vision
Authors: Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Abstract summary: We develop a new learning paradigm named INTERN. By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability. In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data.
Score: 117.3343347061931
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a key challenge awaits us, that is, our capability of meeting rapidly-growing scenario-specific demands is severely limited by the cost of acquiring a commensurate amount of training data. This difficult situation is in essence due to limitations of the mainstream learning paradigm: we need to train a new model for each new scenario, based on a large quantity of well-annotated data and commonly from scratch. In tackling this fundamental problem, we move beyond and develop a new learning paradigm named INTERN. By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability. We evaluate our model on 26 well-known datasets that cover four categories of tasks in computer vision. In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data, often by a significant margin. This is an important step towards a promising prospect where such a model with general vision capability can dramatically reduce our reliance on data, thus expediting the adoption of AI technologies. Furthermore, revolving around our new paradigm, we also introduce a new data system, a new architecture, and a new benchmark, which, together, form a general vision ecosystem to support its future development in an open and inclusive manner.

Related papers

From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches [13.778158813149833]
This paper establishes a new taxonomy of model merging methods, systematically comparing different approaches and providing an overview of key developments. Despite the rapid progress in this field, a comprehensive taxonomy and survey summarizing recent advances and predicting future directions are still lacking.
arXiv Detail & Related papers (2025-03-12T02:17:31Z)
Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods. MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections. Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z)
Big Cooperative Learning [7.958840888809145]
We show that the training of foundation models can be interpreted as a form of big cooperative learning. We propose the BigLearn-GAN, which is a novel adversarially-trained foundation model with versatile data sampling capabilities.
arXiv Detail & Related papers (2024-07-31T03:59:14Z)
A survey of synthetic data augmentation methods in computer vision [0.0]
This paper presents an extensive review of synthetic data augmentation techniques. We focus on the important data generation and augmentation techniques, general scope of application and specific use-cases. We provide a summary of common synthetic datasets for training computer vision models.
arXiv Detail & Related papers (2024-03-15T07:34:08Z)
On the Challenges and Opportunities in Generative AI [135.2754367149689]
We argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability.
arXiv Detail & Related papers (2024-02-28T15:19:33Z)
Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset. We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding. Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z)
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities [59.02391344178202]
Vision foundation models (VFMs) serve as potent building blocks for a wide range of AI applications. The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions.
arXiv Detail & Related papers (2024-01-16T01:57:24Z)
A Survey of Serverless Machine Learning Model Inference [0.0]
Generative AI, Computer Vision, and Natural Language Processing have led to an increased integration of AI models into various products. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for large-scale deep learning serving systems.
arXiv Detail & Related papers (2023-11-22T18:46:05Z)
A Comprehensive Study on Model Initialization Techniques Ensuring Efficient Federated Learning [0.0]
Federated learning(FL) has emerged as a promising paradigm for training machine learning models in a distributed and privacy-preserving manner. The choice of methods used for models plays a crucial role in the performance, convergence speed, communication efficiency, privacy guarantees of federated learning systems. Our research meticulously compares, categorizes, and delineates the merits and demerits of each technique, examining their applicability across diverse FL scenarios.
arXiv Detail & Related papers (2023-10-31T23:26:58Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data. We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)
Neurosymbolic AI for Situated Language Understanding [13.249453757295083]
We argue that computational situated grounding provides a solution to some of these learning challenges. Our model reincorporates some ideas of classic AI into a framework of neurosymbolic intelligence. We discuss how situated grounding provides diverse data and multiple levels of modeling for a variety of AI learning challenges.
arXiv Detail & Related papers (2020-12-05T05:03:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.