Probabilistic Prompt Learning for Dense Prediction
- URL: http://arxiv.org/abs/2304.00779v1
- Date: Mon, 3 Apr 2023 08:01:27 GMT
- Title: Probabilistic Prompt Learning for Dense Prediction
- Authors: Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang,
Kwanghoon Sohn
- Abstract summary: We present a novel probabilistic prompt learning to fully exploit the vision-language knowledge in dense prediction tasks.
We introduce learnable class-agnostic attribute prompts to describe universal attributes across the object class.
The attributes are combined with class information and visual-context knowledge to define the class-specific textual distribution.
- Score: 45.577125507777474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in deterministic prompt learning has become a promising
alternative to various downstream vision tasks, enabling models to learn
powerful visual representations with the help of pre-trained vision-language
models. However, this approach results in limited performance for dense
prediction tasks that require handling more complex and diverse objects, since
a single and deterministic description cannot sufficiently represent the entire
image. In this paper, we present a novel probabilistic prompt learning to fully
exploit the vision-language knowledge in dense prediction tasks. First, we
introduce learnable class-agnostic attribute prompts to describe universal
attributes across the object class. The attributes are combined with class
information and visual-context knowledge to define the class-specific textual
distribution. Text representations are sampled and used to guide the dense
prediction task using the probabilistic pixel-text matching loss, enhancing the
stability and generalization capability of the proposed method. Extensive
experiments on different dense prediction tasks and ablation studies
demonstrate the effectiveness of our proposed method.
Related papers
- XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization [4.634780391920529]
We propose a novel explainable prompt learning framework that leverages medical knowledge by aligning the semantics of images, learnable prompts, and clinical concept-driven prompts.
Our framework addresses the lack of valuable concept annotations by eliciting knowledge from large language models.
Our method simultaneously achieves superior diagnostic performance, flexibility, and interpretability, shedding light on the effectiveness of foundation models in facilitating XAI.
arXiv Detail & Related papers (2024-03-14T14:02:01Z) - TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models [14.019349267520541]
We propose a novel method that leverages the capabilities of language models to interpret the learned features of pre-trained image classifiers.
Our approach generates a vast number of sentences to explain the features learned by the classifier for a given image.
Our method, for the first time, utilizes these frequent words corresponding to a visual representation to provide insights into the decision-making process.
arXiv Detail & Related papers (2023-09-01T20:59:46Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - Fine-Grained Visual Entailment [51.66881737644983]
We propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.
Unlike prior work, our method is inherently explainable and makes logical predictions at different levels of granularity.
We evaluate our method on a new dataset of manually annotated knowledge elements and show that our method achieves 68.18% accuracy at this challenging task.
arXiv Detail & Related papers (2022-03-29T16:09:38Z) - DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [91.56988987393483]
We present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models.
Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones.
arXiv Detail & Related papers (2021-12-02T18:59:32Z) - Self-training with Few-shot Rationalization: Teacher Explanations Aid
Student in Few-shot NLU [88.8401599172922]
We develop a framework based on self-training language models with limited task-specific labels and rationales.
We show that the neural model performance can be significantly improved by making it aware of its rationalized predictions.
arXiv Detail & Related papers (2021-09-17T00:36:46Z) - Multivariate Business Process Representation Learning utilizing Gramian
Angular Fields and Convolutional Neural Networks [0.0]
Learning meaningful representations of data is an important aspect of machine learning.
For predictive process analytics, it is essential to have all explanatory characteristics of a process instance available.
We propose a novel approach for representation learning of business process instances.
arXiv Detail & Related papers (2021-06-15T10:21:14Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.