A Critical Look at the Current Usage of Foundation Model for Dense
Recognition Task
- URL: http://arxiv.org/abs/2307.02862v2
- Date: Tue, 1 Aug 2023 06:47:27 GMT
- Title: A Critical Look at the Current Usage of Foundation Model for Dense
Recognition Task
- Authors: Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku
- Abstract summary: Large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields.
It is still unclear whether those foundation models can be applied to other different downstream tasks.
- Score: 26.938332354370814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years large model trained on huge amount of cross-modality data,
which is usually be termed as foundation model, achieves conspicuous
accomplishment in many fields, such as image recognition and generation. Though
achieving great success in their original application case, it is still unclear
whether those foundation models can be applied to other different downstream
tasks. In this paper, we conduct a short survey on the current methods for
discriminative dense recognition tasks, which are built on the pretrained
foundation model. And we also provide some preliminary experimental analysis of
an existing open-vocabulary segmentation method based on Stable Diffusion,
which indicates the current way of deploying diffusion model for segmentation
is not optimal. This aims to provide insights for future research on adopting
foundation model for downstream task.
Related papers
- FRoundation: Are Foundation Models Ready for Face Recognition? [8.045296450065019]
We propose and demonstrate the adaptation of foundation models for face recognition across different levels of data availability.
Our results indicate that, despite their versatility, pre-trained foundation models underperform in face recognition.
Fine-tuning foundation models yields promising results, often surpassing models trained from scratch when training data is limited.
arXiv Detail & Related papers (2024-10-31T11:21:21Z) - Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - Learning Diffusion Priors from Observations by Expectation Maximization [6.224769485481242]
We present a novel method based on the expectation-maximization algorithm for training diffusion models from incomplete and noisy observations only.
As part of our method, we propose and motivate an improved posterior sampling scheme for unconditional diffusion models.
arXiv Detail & Related papers (2024-05-22T15:04:06Z) - Model Will Tell: Training Membership Inference for Diffusion Models [15.16244745642374]
Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model.
In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.
arXiv Detail & Related papers (2024-03-13T12:52:37Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Unsupervised Deep Learning Meets Chan-Vese Model [77.24463525356566]
We propose an unsupervised image segmentation approach that integrates the Chan-Vese (CV) model with deep neural networks.
Our basic idea is to apply a deep neural network that maps the image into a latent space to alleviate the violation of the piecewise constant assumption in image space.
arXiv Detail & Related papers (2022-04-14T13:23:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.