AutoVP: An Automated Visual Prompting Framework and Benchmark
- URL: http://arxiv.org/abs/2310.08381v2
- Date: Sun, 10 Mar 2024 19:00:00 GMT
- Title: AutoVP: An Automated Visual Prompting Framework and Benchmark
- Authors: Hsi-Ai Tsao, Lei Hsiung, Pin-Yu Chen, Sijia Liu, Tsung-Yi Ho
- Abstract summary: Visual prompting (VP) is an emerging parameter-efficient fine-tuning approach to adapting pre-trained vision models to solve various downstream image-classification tasks.
We propose AutoVP, an end-to-end expandable framework for automating VP design choices, along with 12 downstream image-classification tasks.
Our experimental results show that AutoVP outperforms the best-known current VP methods by a substantial margin.
- Score: 66.5618543577204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual prompting (VP) is an emerging parameter-efficient fine-tuning approach
to adapting pre-trained vision models to solve various downstream
image-classification tasks. However, there has hitherto been little systematic
study of the design space of VP and no clear benchmark for evaluating its
performance. To bridge this gap, we propose AutoVP, an end-to-end expandable
framework for automating VP design choices, along with 12 downstream
image-classification tasks that can serve as a holistic VP-performance
benchmark. Our design space covers 1) the joint optimization of the prompts; 2)
the selection of pre-trained models, including image classifiers and text-image
encoders; and 3) model output mapping strategies, including nonparametric and
trainable label mapping. Our extensive experimental results show that AutoVP
outperforms the best-known current VP methods by a substantial margin, having
up to 6.7% improvement in accuracy; and attains a maximum performance increase
of 27.5% compared to linear-probing (LP) baseline. AutoVP thus makes a two-fold
contribution: serving both as an efficient tool for hyperparameter tuning on VP
design choices, and as a comprehensive benchmark that can reasonably be
expected to accelerate VP's development. The source code is available at
https://github.com/IBM/AutoVP.
Related papers
- Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes [70.08318779492944]
We are the first to harness vanishing point (VP) priors for more effective segmentation.
Our novel, efficient network for VSS, named VPSeg, incorporates two modules that utilize exactly this pair of static and dynamic VP priors.
arXiv Detail & Related papers (2024-01-27T01:01:58Z) - Facing the Elephant in the Room: Visual Prompt Tuning or Full
Finetuning? [92.23438255540968]
Visual Prompt Tuning is a parameter-efficient transfer learning technique.
We conduct a comprehensive analysis across 19 distinct datasets and tasks.
Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.
arXiv Detail & Related papers (2024-01-23T16:48:18Z) - VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness [56.87603097348203]
VeCAF uses labels and natural language annotations to perform parametric data selection for PVM finetuning.
VeCAF incorporates the finetuning objective to select significant data points that effectively guide the PVM towards faster convergence.
On ImageNet, VeCAF uses up to 3.3x less training batches to reach the target performance compared to full finetuning.
arXiv Detail & Related papers (2024-01-15T17:28:37Z) - VAD: Vectorized Scene Representation for Efficient Autonomous Driving [44.070636456960045]
VAD is an end-to-end vectorized paradigm for autonomous driving.
VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints.
VAD runs much faster than previous end-to-end planning methods.
arXiv Detail & Related papers (2023-03-21T17:59:22Z) - Understanding and Improving Visual Prompting: A Label-Mapping
Perspective [63.89295305670113]
We revisit and advance visual prompting (VP), an input prompting technique for vision tasks.
We propose a new VP framework, termed ILM-VP, which automatically re-maps the source labels to the target labels.
Our proposal significantly outperforms state-of-the-art VP methods.
arXiv Detail & Related papers (2022-11-21T16:49:47Z) - Declaration-based Prompt Tuning for Visual Question Answering [16.688288454811016]
We propose an innovative visual-language (VL) fine-tuning paradigm (named Declaration-based Prompt Tuning, abbreviated as DPT)
DPT jointly optimize the objectives of pre-training and fine-tuning of VQA model, boosting the effective adaptation of pre-trained VL models to the downstream task.
Experimental results on GQA dataset show that DPT outperforms the fine-tuned counterpart by a large margin regarding accuracy in both fully-supervised (2.68%) and zero-shot/few-shot (over 31%) settings.
arXiv Detail & Related papers (2022-05-05T05:56:55Z) - Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks.
We propose Single-Path Vision Transformer pruning (SPViT) to efficiently and automatically compress the pre-trained ViTs.
Our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously.
arXiv Detail & Related papers (2021-11-23T11:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.