Black-Box Tuning of Vision-Language Models with Effective Gradient
Approximation
- URL: http://arxiv.org/abs/2312.15901v1
- Date: Tue, 26 Dec 2023 06:31:28 GMT
- Title: Black-Box Tuning of Vision-Language Models with Effective Gradient
Approximation
- Authors: Zixian Guo, Yuxiang Wei, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo,
Wangmeng Zuo
- Abstract summary: We introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models.
CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods.
- Score: 71.21346469382821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning (PEFT) methods have provided an effective way
for adapting large vision-language models to specific tasks or scenarios.
Typically, they learn a very small scale of parameters for pre-trained models
in a white-box formulation, which assumes model architectures to be known and
parameters to be accessible. However, large models are often not open-source
due to considerations of preventing abuse or commercial factors, hence posing a
barrier to the deployment of white-box PEFT methods. To alleviate the
dependence on model accessibility, we introduce collaborative black-box tuning
(CBBT) for both textual prompt optimization and output feature adaptation for
black-box models. Specifically, considering that the backpropagation gradients
are blocked, we approximate the gradients of textual prompts by analyzing the
predictions with perturbed prompts. Secondly, a lightweight adapter is deployed
over the output feature of the inaccessible model, further facilitating the
model adaptation process. Empowered with these designs, our CBBT is extensively
evaluated on eleven downstream benchmarks and achieves remarkable improvements
compared to existing black-box VL adaptation methods. Code is released at
https://github.com/guozix/cbbt.
Related papers
- IOTA: Corrective Knowledge-Guided Prompt Learning via Black-White Box Framework [57.66924056568018]
We propose a novel black-whIte bOx prompT leArning framework (IOTA) for adapting pre-trained models to downstream tasks.<n>IOTA integrates a data-driven Black Box module with a knowledge-driven White Box module for downstream task adaptation.
arXiv Detail & Related papers (2026-01-28T12:03:48Z) - Advanced Black-Box Tuning of Large Language Models with Limited API Calls [20.29862533577494]
Black-box tuning is an emerging paradigm for adapting large language models (LLMs) to better achieve desired behaviors.<n>We propose a novel advanced black-box tuning method for LLMs with limited API calls.<n>Our approach elevates pre-trained language model accuracy from 55.92% to 86.85%, reducing the frequency of API queries to merely 1.38%.
arXiv Detail & Related papers (2025-11-13T11:32:08Z) - Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation [40.37204049034554]
We introduce the Black-Box Distillation (B2D) setting, which enables local model adaptation under realistic constraints.<n>Open-vocabulary models exhibit significant sensitivity to input resolution, with different object classes being segmented optimally at different scales.<n>Our method, AT-Guided sCaler (ATGC), addresses this challenge by leveraging DINOv2 attention maps to dynamically select optimal scales for black-box model inference.
arXiv Detail & Related papers (2025-08-30T14:03:09Z) - Black-Box Forgetting [8.84485103053191]
We address a novel problem of selective forgetting for black-box models, named Black-Box Forgetting.
We propose Latent Context Sharing, which introduces common low-dimensional latent components among multiple tokens for the prompt.
Experiments on four standard benchmark datasets demonstrate the superiority of our method with reasonable baselines.
arXiv Detail & Related papers (2024-11-01T07:10:40Z) - Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models [21.698201509643624]
Self-interpretable models, such as concept-based networks, offer insights by connecting decisions to human-understandable concepts.
Post-hoc methods like Shapley values, while theoretically robust, are computationally expensive and resource-intensive.
We propose a novel method that combines their strengths, providing theoretically guaranteed self-interpretability for black-box models.
arXiv Detail & Related papers (2024-10-29T07:35:33Z) - Cliqueformer: Model-Based Optimization with Structured Transformers [102.55764949282906]
We develop a model that learns the structure of an MBO task and empirically leads to improved designs.
We evaluate Cliqueformer on various tasks, ranging from high-dimensional black-box functions to real-world tasks of chemical and genetic design.
arXiv Detail & Related papers (2024-10-17T00:35:47Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - CPT: Consistent Proxy Tuning for Black-box Optimization [63.06335358432746]
Proxy-tuning provides a test-time output adjustment for tuning black-box language models.
We introduce Consistent Proxy Tuning (CPT), a simple yet effective black-box tuning method.
CPT exploits the frozen large black-box model and another frozen small white-box model, ensuring consistency between training-stage optimization objective and test-time proxies.
arXiv Detail & Related papers (2024-07-01T10:23:14Z) - Preference Alignment with Flow Matching [23.042382086241364]
Preference Flow Matching (PFM) is a new framework for preference-based reinforcement learning (PbRL)
It streamlines the integration of preferences into an arbitrary class of pre-trained models.
We provide theoretical insights that support our method's alignment with standard PbRL objectives.
arXiv Detail & Related papers (2024-05-30T08:16:22Z) - Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior [36.101904669291436]
This paper studies the challenging black-box adversarial attack that aims to generate examples against a black-box model by only using output feedback of the model to input queries.
We propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks.
Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior.
arXiv Detail & Related papers (2024-05-29T14:05:16Z) - Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning [13.211063836237468]
We introduce Model augmented fine-tuning (Mafin) -- a novel approach for fine-tuning a black-box embedding model by augmenting it with a trainable embedding model.
Our results demonstrate that Mafin significantly enhances the performance of the black-box embeddings by only requiring the training of a small augmented model.
arXiv Detail & Related papers (2024-02-19T14:33:24Z) - Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models [121.0693322732454]
This paper proposes a textbfCraFT' approach for fine-tuning black-box vision-language models to downstream tasks.
CraFT comprises two modules, a prompt generation module for learning text prompts and a prediction refinement module for enhancing output predictions in residual style.
Experiments on few-shot classification over 15 datasets demonstrate the superiority of CraFT.
arXiv Detail & Related papers (2024-02-06T14:53:19Z) - Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data
Augmentation [42.05617728412819]
We show how to optimize few-shot text classification without accessing the gradients of the large-scale language models.
Our approach, dubbed BT-Classifier, significantly outperforms state-of-the-art black-box few-shot learners.
arXiv Detail & Related papers (2023-05-23T07:54:34Z) - BBTv2: Pure Black-Box Optimization Can Be Comparable to Gradient Descent
for Few-Shot Learning [83.26610968655815]
Black-Box Tuning is a derivative-free approach to optimize continuous prompt tokens prepended to the input of language models.
We present BBTv2, a pure black-box optimization approach that can drive language models to achieve comparable results to gradient-based optimization.
arXiv Detail & Related papers (2022-05-23T11:10:19Z) - How to Robustify Black-Box ML Models? A Zeroth-Order Optimization
Perspective [74.47093382436823]
We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback?
We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS)
We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
arXiv Detail & Related papers (2022-03-27T03:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.