Generative Prompt Model for Weakly Supervised Object Localization
- URL: http://arxiv.org/abs/2307.09756v1
- Date: Wed, 19 Jul 2023 05:40:38 GMT
- Title: Generative Prompt Model for Weakly Supervised Object Localization
- Authors: Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan
- Abstract summary: We propose a generative prompt model (GenPromp) to localize less discriminative object parts.
During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model.
Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models.
- Score: 108.79255454746189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly supervised object localization (WSOL) remains challenging when
learning object localization models from image category labels. Conventional
methods that discriminatively train activation models ignore representative yet
less discriminative object parts. In this study, we propose a generative prompt
model (GenPromp), defining the first generative pipeline to localize less
discriminative object parts by formulating WSOL as a conditional image
denoising procedure. During training, GenPromp converts image category labels
to learnable prompt embeddings which are fed to a generative model to
conditionally recover the input image with noise and learn representative
embeddings. During inference, enPromp combines the representative embeddings
with discriminative embeddings (queried from an off-the-shelf vision-language
model) for both representative and discriminative capacity. The combined
embeddings are finally used to generate multi-scale high-quality attention
maps, which facilitate localizing full object extent. Experiments on
CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best
discriminative models by 5.2% and 5.6% (Top-1 Loc), setting a solid baseline
for WSOL with the generative model. Code is available at
https://github.com/callsys/GenPromp.
Related papers
- Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning.
Our approach directly generates labels for images using an adapted generative model.
Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z) - Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models [68.73086826874733]
We introduce a novel Referring Diffusional segmentor (Ref-Diff) for referring image segmentation.
We demonstrate that without a proposal generator, a generative model alone can achieve comparable performance to existing SOTA weakly-supervised models.
This indicates that generative models are also beneficial for this task and can complement discriminative models for better referring segmentation.
arXiv Detail & Related papers (2023-08-31T14:55:30Z) - Diffusion Models Beat GANs on Image Classification [37.70821298392606]
Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc.
We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification.
We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods for classification tasks.
arXiv Detail & Related papers (2023-07-17T17:59:40Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - Constrained Sampling for Class-Agnostic Weakly Supervised Object
Localization [10.542859578763068]
Self-supervised vision transformers can generate accurate localization maps of the objects in an image.
We propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a weakly-supervised object localization model.
arXiv Detail & Related papers (2022-09-09T19:58:38Z) - Discriminative Sampling of Proposals in Self-Supervised Transformers for
Weakly Supervised Object Localization [10.542859578763068]
Self-supervised vision transformers can generate accurate localization maps of the objects in an image.
We propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a weakly-supervised object localization model.
arXiv Detail & Related papers (2022-09-09T18:33:23Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - Classify and Generate: Using Classification Latent Space Representations
for Image Generations [17.184760662429834]
We propose a discriminative modeling framework that employs manipulated supervised latent representations to reconstruct and generate new samples belonging to a given class.
ReGene has higher classification accuracy than existing conditional generative models while being competitive in terms of FID.
arXiv Detail & Related papers (2020-04-16T09:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.