Universal Domain Adaptation from Foundation Models: A Baseline Study
- URL: http://arxiv.org/abs/2305.11092v2
- Date: Fri, 3 Nov 2023 03:34:13 GMT
- Title: Universal Domain Adaptation from Foundation Models: A Baseline Study
- Authors: Bin Deng and Kui Jia
- Abstract summary: We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
- Score: 58.51162198585434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models (e.g., CLIP or DINOv2) have shown their impressive learning
and transfer capabilities in a wide range of visual tasks, by training on a
large corpus of data and adapting to specific downstream tasks. It is, however,
interesting that foundation models have not been fully explored for universal
domain adaptation (UniDA), which is to learn models using labeled data in a
source domain and unlabeled data in a target one, such that the learned models
can successfully adapt to the target data. In this paper, we make comprehensive
empirical studies of state-of-the-art UniDA methods using foundation models. We
first observe that, unlike fine-tuning from ImageNet pre-trained models, as
previous methods do, fine-tuning from foundation models yields significantly
poorer results, sometimes even worse than training from scratch. While freezing
the backbones, we demonstrate that although the foundation models greatly
improve the performance of the baseline method that trains the models on the
source data alone, existing UniDA methods generally fail to improve over the
baseline. This suggests that new research efforts are very necessary for UniDA
using foundation models. Based on these findings, we introduce \textit{CLIP
distillation}, a parameter-free method specifically designed to distill target
knowledge from CLIP models. The core of our \textit{CLIP distillation} lies in
a self-calibration technique for automatic temperature scaling, a feature that
significantly enhances the baseline's out-class detection capability. Although
simple, our method outperforms previous approaches in most benchmark tasks,
excelling in evaluation metrics including H-score/H$^3$-score and the newly
proposed universal classification rate (UCR) metric. We hope that our
investigation and the proposed simple framework can serve as a strong baseline
to facilitate future studies in this field.
Related papers
- High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Bayesian Exploration of Pre-trained Models for Low-shot Image Classification [14.211305168954594]
This work proposes a simple and effective probabilistic model ensemble framework based on Gaussian processes.
We achieve the integration of prior knowledge by specifying the mean function with CLIP and the kernel function.
We demonstrate that our method consistently outperforms competitive ensemble baselines regarding predictive performance.
arXiv Detail & Related papers (2024-03-30T10:25:28Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - How to train your draGAN: A task oriented solution to imbalanced
classification [15.893327571516016]
This paper proposes a unique, performance-oriented, data-generating strategy that utilizes a new architecture, coined draGAN.
The samples are generated with the objective of optimizing the classification model's performance, rather than similarity to the real data.
Empirically we show the superiority of draGAN, but also highlight some of its shortcomings.
arXiv Detail & Related papers (2022-11-18T07:37:34Z) - DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical
Representations [2.542864854772221]
We present a novel Domain Adaptation-aided deep Table detection method called DATa.
It guarantees satisfactory performance in a specific target domain where few trusted labels are available.
Experiments show that DATa substantially outperforms competing methods that only utilize visual representations in the target domain.
arXiv Detail & Related papers (2022-11-12T12:14:16Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.