TeD-Loc: Text Distillation for Weakly Supervised Object Localization
- URL: http://arxiv.org/abs/2501.12632v1
- Date: Wed, 22 Jan 2025 04:36:17 GMT
- Title: TeD-Loc: Text Distillation for Weakly Supervised Object Localization
- Authors: Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Eric Granger,
- Abstract summary: TeD-Loc is an approach that distills knowledge from CLIP text embeddings into the model backbone and produces patch-level localization.
It improves Top-1 LOC accuracy over state-of-the-art models by about 5% on both CUB and ILSVRC datasets.
- Score: 13.412674368913747
- License:
- Abstract: Weakly supervised object localization (WSOL) using classification models trained with only image-class labels remains an important challenge in computer vision. Given their reliance on classification objectives, traditional WSOL methods like class activation mapping focus on the most discriminative object parts, often missing the full spatial extent. In contrast, recent WSOL methods based on vision-language models like CLIP require ground truth classes or external classifiers to produce a localization map, limiting their deployment in downstream tasks. Moreover, methods like GenPromp attempt to address these issues but introduce considerable complexity due to their reliance on conditional denoising processes and intricate prompt learning. This paper introduces Text Distillation for Localization (TeD-Loc), an approach that directly distills knowledge from CLIP text embeddings into the model backbone and produces patch-level localization. Multiple instance learning of these image patches allows for accurate localization and classification using one model without requiring external classifiers. Such integration of textual and visual modalities addresses the longstanding challenge of achieving accurate localization and classification concurrently, as WSOL methods in the literature typically converge at different epochs. Extensive experiments show that leveraging text embeddings and localization cues provides a cost-effective WSOL model. TeD-Loc improves Top-1 LOC accuracy over state-of-the-art models by about 5% on both CUB and ILSVRC datasets, while significantly reducing computational complexity compared to GenPromp.
Related papers
- Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Generative Prompt Model for Weakly Supervised Object Localization [108.79255454746189]
We propose a generative prompt model (GenPromp) to localize less discriminative object parts.
During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model.
Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models.
arXiv Detail & Related papers (2023-07-19T05:40:38Z) - Prompting Language-Informed Distribution for Compositional Zero-Shot Learning [73.49852821602057]
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional visual concepts.
We propose a model by prompting the language-informed distribution, aka., PLID, for the task.
Experimental results on MIT-States, UT-Zappos, and C-GQA datasets show the superior performance of the PLID to the prior arts.
arXiv Detail & Related papers (2023-05-23T18:00:22Z) - CREAM: Weakly Supervised Object Localization via Class RE-Activation
Mapping [18.67907876709536]
Class RE-Activation Mapping (CREAM) is a clustering-based approach to boost the activation values of the integral object regions.
CREAM achieves the state-of-the-art performance on CUB, ILSVRC and OpenImages benchmark datasets.
arXiv Detail & Related papers (2022-05-27T11:57:41Z) - Evaluation for Weakly Supervised Object Localization: Protocol, Metrics,
and Datasets [65.73451960585571]
We argue that weakly-supervised object localization (WSOL) task is ill-posed with only image-level labels.
We propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
arXiv Detail & Related papers (2020-07-08T15:09:16Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z) - Weakly-supervised Object Localization for Few-shot Learning and
Fine-grained Few-shot Learning [0.5156484100374058]
Few-shot learning aims to learn novel visual categories from very few samples.
We propose a Self-Attention Based Complementary Module (SAC Module) to fulfill the weakly-supervised object localization.
We also produce the activated masks for selecting discriminative deep descriptors for few-shot classification.
arXiv Detail & Related papers (2020-03-02T14:07:05Z) - Evaluating Weakly Supervised Object Localization Methods Right [65.73451960585571]
We argue that weakly-supervised object localization (WSOL) task is ill-posed with only image-level labels.
We propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
arXiv Detail & Related papers (2020-01-21T10:50:06Z) - Deep Weakly-Supervised Learning Methods for Classification and
Localization in Histology Images: A Survey [25.429124017422385]
Using deep learning models to diagnose cancer presents several challenges.
Deep weakly-supervised object localization (WSOL) methods provide strategies for low-cost training of deep learning models.
This paper provides a review of state-of-art DL methods for WSOL.
arXiv Detail & Related papers (2019-09-08T00:01:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.