ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models
- URL: http://arxiv.org/abs/2504.06838v1
- Date: Wed, 09 Apr 2025 12:56:22 GMT
- Title: ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models
- Authors: Seonghwan Park, Jaehyeon Jeong, Yongjun Kim, Jaeho Lee, Namhoon Lee,
- Abstract summary: We propose Zeroth-order Intrinsic-dimensional Prompt-tuning, which enables efficient and robust prompt optimization in a purely black-box setting.<n>We evaluate ZIP on 13+ vision-language tasks in standard benchmarks and show that it achieves an average improvement of approximately 6% in few-shot accuracy and 48% in query efficiency.
- Score: 14.137615267026755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have introduced various approaches for prompt-tuning black-box vision-language models, referred to as black-box prompt-tuning (BBPT). While BBPT has demonstrated considerable potential, it is often found that many existing methods require an excessive number of queries (i.e., function evaluations), which poses a significant challenge in real-world scenarios where the number of allowed queries is limited. To tackle this issue, we propose Zeroth-order Intrinsic-dimensional Prompt-tuning (ZIP), a novel approach that enables efficient and robust prompt optimization in a purely black-box setting. The key idea of ZIP is to reduce the problem dimensionality and the variance of zeroth-order gradient estimates, such that the training is done fast with far less queries. We achieve this by re-parameterizing prompts in low-rank representations and designing intrinsic-dimensional clipping of estimated gradients. We evaluate ZIP on 13+ vision-language tasks in standard benchmarks and show that it achieves an average improvement of approximately 6% in few-shot accuracy and 48% in query efficiency compared to the best-performing alternative BBPT methods, establishing a new state of the art. Our ablation analysis further shows that the proposed clipping mechanism is robust and nearly optimal, without the need to manually select the clipping threshold, matching the result of expensive hyperparameter search.
Related papers
- Hyperband-based Bayesian Optimization for Black-box Prompt Selection [15.756224286651237]
Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks.<n>We introduce HbBoPs, a novel Hyperband-based Bayesian optimization method for black-box prompt selection.<n>Our approach combines a structural-aware deep kernel Gaussian Process to model prompt performance with Hyperband as a multi-fidelity scheduler.
arXiv Detail & Related papers (2024-12-10T14:42:51Z) - STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario [50.37501379058119]
We propose the Spatial Transform Black-box Attack (STBA) to craft formidable adversarial examples in the query-limited scenario.
We show that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.
arXiv Detail & Related papers (2024-03-30T13:28:53Z) - Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model [86.9619638550683]
Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data.<n>However, these models display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of decision shortcuts''
arXiv Detail & Related papers (2024-03-01T09:01:53Z) - Test-Time Personalization with Meta Prompt for Gaze Estimation [23.01057994927244]
We take inspiration from the recent advances in Natural Language Processing (NLP) by updating a negligible number of parameters, "prompts", at the test time.
We propose to meta-learn the prompt to ensure that its updates align with the goal.
Our experiments show that the meta-learned prompt can be effectively adapted even with a simple symmetry loss.
arXiv Detail & Related papers (2024-01-03T07:02:35Z) - Black-Box Tuning of Vision-Language Models with Effective Gradient
Approximation [71.21346469382821]
We introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models.
CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods.
arXiv Detail & Related papers (2023-12-26T06:31:28Z) - Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models [88.80146574509195]
Quantization is a promising approach for reducing memory overhead and accelerating inference.
We propose a novel-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs.
arXiv Detail & Related papers (2023-10-20T07:09:56Z) - BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent
for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.
We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z) - Learning How to Optimize Black-Box Functions With Extreme Limits on the
Number of Function Evaluations [3.0969191504482243]
We consider black-box optimization in which only an extremely limited number of function evaluations, on the order of around 100, are affordable.
We propose an original method that uses established approaches to propose a set of points for each batch and then down-selects from these candidate points to the number of trials that can be run in parallel.
We achieve an average reduction of 50% of normalized cost, which is a highly significant improvement in performance.
arXiv Detail & Related papers (2021-03-18T15:30:15Z) - Effective and Fast: A Novel Sequential Single Path Search for
Mixed-Precision Quantization [45.22093693422085]
Mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance.
It is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints.
We propose a novel sequential single path search (SSPS) method for mixed-precision quantization.
arXiv Detail & Related papers (2021-03-04T09:15:08Z) - Projection & Probability-Driven Black-Box Attack [205.9923346080908]
Existing black-box attacks suffer from the need for excessive queries in the high-dimensional space.
We propose Projection & Probability-driven Black-box Attack (PPBA) to tackle this problem.
Our method requires at most 24% fewer queries with a higher attack success rate compared with state-of-the-art approaches.
arXiv Detail & Related papers (2020-05-08T03:37:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.