Black-Box Tuning of Vision-Language Models with Effective Gradient
Approximation
- URL: http://arxiv.org/abs/2312.15901v1
- Date: Tue, 26 Dec 2023 06:31:28 GMT
- Title: Black-Box Tuning of Vision-Language Models with Effective Gradient
Approximation
- Authors: Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo,
Wangmeng Zuo
- Abstract summary: We introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models.
CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods.
- Score: 71.21346469382821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning (PEFT) methods have provided an effective way
for adapting large vision-language models to specific tasks or scenarios.
Typically, they learn a very small scale of parameters for pre-trained models
in a white-box formulation, which assumes model architectures to be known and
parameters to be accessible. However, large models are often not open-source
due to considerations of preventing abuse or commercial factors, hence posing a
barrier to the deployment of white-box PEFT methods. To alleviate the
dependence on model accessibility, we introduce collaborative black-box tuning
(CBBT) for both textual prompt optimization and output feature adaptation for
black-box models. Specifically, considering that the backpropagation gradients
are blocked, we approximate the gradients of textual prompts by analyzing the
predictions with perturbed prompts. Secondly, a lightweight adapter is deployed
over the output feature of the inaccessible model, further facilitating the
model adaptation process. Empowered with these designs, our CBBT is extensively
evaluated on eleven downstream benchmarks and achieves remarkable improvements
compared to existing black-box VL adaptation methods. Code is released at
https://github.com/guozix/cbbt.
Related papers
- Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models [21.698201509643624]
Self-interpretable models, such as concept-based networks, offer insights by connecting decisions to human-understandable concepts.
Post-hoc methods like Shapley values, while theoretically robust, are computationally expensive and resource-intensive.
We propose a novel method that combines their strengths, providing theoretically guaranteed self-interpretability for black-box models.
arXiv Detail & Related papers (2024-10-29T07:35:33Z) - Cliqueformer: Model-Based Optimization with Structured Transformers [102.55764949282906]
We develop a model that learns the structure of an MBO task and empirically leads to improved designs.
We evaluate Cliqueformer on various tasks, ranging from high-dimensional black-box functions to real-world tasks of chemical and genetic design.
arXiv Detail & Related papers (2024-10-17T00:35:47Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - CPT: Consistent Proxy Tuning for Black-box Optimization [63.06335358432746]
Proxy-tuning provides a test-time output adjustment for tuning black-box language models.
We introduce Consistent Proxy Tuning (CPT), a simple yet effective black-box tuning method.
CPT exploits the frozen large black-box model and another frozen small white-box model, ensuring consistency between training-stage optimization objective and test-time proxies.
arXiv Detail & Related papers (2024-07-01T10:23:14Z) - Preference Alignment with Flow Matching [23.042382086241364]
Preference Flow Matching (PFM) is a new framework for preference-based reinforcement learning (PbRL)
It streamlines the integration of preferences into an arbitrary class of pre-trained models.
We provide theoretical insights that support our method's alignment with standard PbRL objectives.
arXiv Detail & Related papers (2024-05-30T08:16:22Z) - Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior [36.101904669291436]
This paper studies the challenging black-box adversarial attack that aims to generate examples against a black-box model by only using output feedback of the model to input queries.
We propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks.
Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior.
arXiv Detail & Related papers (2024-05-29T14:05:16Z) - Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning [13.211063836237468]
We introduce Model augmented fine-tuning (Mafin) -- a novel approach for fine-tuning a black-box embedding model by augmenting it with a trainable embedding model.
Our results demonstrate that Mafin significantly enhances the performance of the black-box embeddings by only requiring the training of a small augmented model.
arXiv Detail & Related papers (2024-02-19T14:33:24Z) - Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models [121.0693322732454]
This paper proposes a textbfCraFT' approach for fine-tuning black-box vision-language models to downstream tasks.
CraFT comprises two modules, a prompt generation module for learning text prompts and a prediction refinement module for enhancing output predictions in residual style.
Experiments on few-shot classification over 15 datasets demonstrate the superiority of CraFT.
arXiv Detail & Related papers (2024-02-06T14:53:19Z) - Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data
Augmentation [42.05617728412819]
We show how to optimize few-shot text classification without accessing the gradients of the large-scale language models.
Our approach, dubbed BT-Classifier, significantly outperforms state-of-the-art black-box few-shot learners.
arXiv Detail & Related papers (2023-05-23T07:54:34Z) - BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent
for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.
We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.