Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation
- URL: http://arxiv.org/abs/2509.00509v1
- Date: Sat, 30 Aug 2025 14:03:09 GMT
- Title: Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation
- Authors: Yasser Benigmim, Subhankar Roy, Khalid Oublal, Imad Eddine Marouf, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière,
- Abstract summary: We introduce the Black-Box Distillation (B2D) setting, which enables local model adaptation under realistic constraints.<n>Open-vocabulary models exhibit significant sensitivity to input resolution, with different object classes being segmented optimally at different scales.<n>Our method, AT-Guided sCaler (ATGC), addresses this challenge by leveraging DINOv2 attention maps to dynamically select optimal scales for black-box model inference.
- Score: 40.37204049034554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of Artificial Intelligence as a Service (AIaaS) democratizes access to pre-trained models via Application Programming Interfaces (APIs), but also raises a fundamental question: how can local models be effectively trained using black-box models that do not expose their weights, training data, or logits, a constraint in which current domain adaptation paradigms are impractical ? To address this challenge, we introduce the Black-Box Distillation (B2D) setting, which enables local model adaptation under realistic constraints: (1) the API model is open-vocabulary and trained on large-scale general-purpose data, and (2) access is limited to one-hot predictions only. We identify that open-vocabulary models exhibit significant sensitivity to input resolution, with different object classes being segmented optimally at different scales, a limitation termed the "curse of resolution". Our method, ATtention-Guided sCaler (ATGC), addresses this challenge by leveraging DINOv2 attention maps to dynamically select optimal scales for black-box model inference. ATGC scores the attention maps with entropy to identify informative scales for pseudo-labelling, enabling effective distillation. Experiments demonstrate substantial improvements under black-box supervision across multiple datasets while requiring only one-hot API predictions. Our code is available at https://github.com/yasserben/ATGC.
Related papers
- Advanced Black-Box Tuning of Large Language Models with Limited API Calls [20.29862533577494]
Black-box tuning is an emerging paradigm for adapting large language models (LLMs) to better achieve desired behaviors.<n>We propose a novel advanced black-box tuning method for LLMs with limited API calls.<n>Our approach elevates pre-trained language model accuracy from 55.92% to 86.85%, reducing the frequency of API queries to merely 1.38%.
arXiv Detail & Related papers (2025-11-13T11:32:08Z) - Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial.<n>In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations.<n>We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z) - DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model [50.94236887900527]
We present a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset.<n>We learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data.
arXiv Detail & Related papers (2024-12-08T07:37:05Z) - Foundational GPT Model for MEG [3.524869467682149]
We propose two classes of deep learning foundational models that can be trained using forecasting of unlabelled brain signals.
First, we consider a modified Wavenet; and second, we consider a modified Transformer-based (GPT2) model.
We compare the performance of these deep learning models with standard linear autoregressive (AR) modelling on MEG data.
arXiv Detail & Related papers (2024-04-14T13:48:24Z) - Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning [13.211063836237468]
We introduce Model augmented fine-tuning (Mafin) -- a novel approach for fine-tuning a black-box embedding model by augmenting it with a trainable embedding model.
Our results demonstrate that Mafin significantly enhances the performance of the black-box embeddings by only requiring the training of a small augmented model.
arXiv Detail & Related papers (2024-02-19T14:33:24Z) - Black-Box Tuning of Vision-Language Models with Effective Gradient
Approximation [71.21346469382821]
We introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models.
CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods.
arXiv Detail & Related papers (2023-12-26T06:31:28Z) - DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model.
We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z) - CELDA: Leveraging Black-box Language Model as Enhanced Classifier
without Labels [14.285609493077965]
Clustering-enhanced Linear Discriminative Analysis, a novel approach that improves the text classification accuracy with a very weak-supervision signal.
Our framework draws a precise decision boundary without accessing weights or gradients of the LM model or data labels.
arXiv Detail & Related papers (2023-06-05T08:35:31Z) - Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data
Augmentation [42.05617728412819]
We show how to optimize few-shot text classification without accessing the gradients of the large-scale language models.
Our approach, dubbed BT-Classifier, significantly outperforms state-of-the-art black-box few-shot learners.
arXiv Detail & Related papers (2023-05-23T07:54:34Z) - How to Robustify Black-Box ML Models? A Zeroth-Order Optimization
Perspective [74.47093382436823]
We address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback?
We propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS)
We empirically show that ZO-AE-DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines.
arXiv Detail & Related papers (2022-03-27T03:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.