LPT++: Efficient Training on Mixture of Long-tailed Experts
- URL: http://arxiv.org/abs/2409.11323v1
- Date: Tue, 17 Sep 2024 16:19:11 GMT
- Title: LPT++: Efficient Training on Mixture of Long-tailed Experts
- Authors: Bowen Dong, Pan Zhou, Wangmeng Zuo,
- Abstract summary: ++ enhances frozen Vision Transformers (ViTs) through the integration of three core components.
The first is a universal long-tailed adaptation module, which aggregates long-tailed prompts and visual adapters to adapt the pretrained model to the target domain.
The second is the mixture of long-tailed experts framework with a mixture-of-experts (MoE) scorer, which adaptively calculates reweighting coefficients for confidence scores from both visual-only and visual-language (VL) model experts to generate more accurate predictions.
- Score: 107.78420448806357
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce LPT++, a comprehensive framework for long-tailed classification that combines parameter-efficient fine-tuning (PEFT) with a learnable model ensemble. LPT++ enhances frozen Vision Transformers (ViTs) through the integration of three core components. The first is a universal long-tailed adaptation module, which aggregates long-tailed prompts and visual adapters to adapt the pretrained model to the target domain, meanwhile improving its discriminative ability. The second is the mixture of long-tailed experts framework with a mixture-of-experts (MoE) scorer, which adaptively calculates reweighting coefficients for confidence scores from both visual-only and visual-language (VL) model experts to generate more accurate predictions. Finally, LPT++ employs a three-phase training framework, wherein each critical module is learned separately, resulting in a stable and effective long-tailed classification training paradigm. Besides, we also propose the simple version of LPT++ namely LPT, which only integrates visual-only pretrained ViT and long-tailed prompts to formulate a single model method. LPT can clearly illustrate how long-tailed prompts works meanwhile achieving comparable performance without VL pretrained models. Experiments show that, with only ~1% extra trainable parameters, LPT++ achieves comparable accuracy against all the counterparts.
Related papers
- One Fits All: Universal Time Series Analysis by Pretrained LM and
Specially Designed Adaptors [23.292260325891032]
We introduce four unique adapters, designed specifically for downstream tasks based on the pre-trained model.
These adapters are further enhanced with efficient parameter tuning, resulting in superior performance compared to all state-of-the-art methods.
arXiv Detail & Related papers (2023-11-24T16:32:47Z) - Unlocking the Potential of Prompt-Tuning in Bridging Generalized and
Personalized Federated Learning [49.72857433721424]
Vision Transformers (ViT) and Visual Prompt Tuning (VPT) achieve state-of-the-art performance with improved efficiency in various computer vision tasks.
We present a novel algorithm, SGPT, that integrates Generalized FL (GFL) and Personalized FL (PFL) approaches by employing a unique combination of both shared and group-specific prompts.
arXiv Detail & Related papers (2023-10-27T17:22:09Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need [84.3507610522086]
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting old ones.
Recent pre-training has achieved substantial progress, making vast pre-trained models (PTMs) accessible for CIL.
We argue that the core factors in CIL are adaptivity for model updating and generalizability for knowledge transferring.
arXiv Detail & Related papers (2023-03-13T17:59:02Z) - LPT: Long-tailed Prompt Tuning for Image Classification [178.52948452353834]
We introduce several trainable prompts into a frozen pretrained model to adapt it to long-tailed data.
In phase 1, we train the shared prompt via supervised prompt tuning to adapt a pretrained model to the desired long-tailed domain.
In phase 2, we use the learnt shared prompt as query to select a small best matched set for a group of similar samples.
arXiv Detail & Related papers (2022-10-03T15:47:02Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - VC-GPT: Visual Conditioned GPT for End-to-End Generative
Vision-and-Language Pre-training [9.511101155155957]
A vision-and-language pre-training model (VLMs) has achieved tremendous success in the cross-modal area, but most of them require millions of parallel image-caption data for pre-training.
In this work, we focus on reducing such need for generative vision-and-language pre-training by taking advantage of the visual pre-trained model (CLIP-ViT) as encoder and language pre-trained model (GPT2) as decoder.
arXiv Detail & Related papers (2022-01-30T04:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.