Learning Hierarchical Prompt with Structured Linguistic Knowledge for
Vision-Language Models
- URL: http://arxiv.org/abs/2312.06323v1
- Date: Mon, 11 Dec 2023 12:14:06 GMT
- Title: Learning Hierarchical Prompt with Structured Linguistic Knowledge for
Vision-Language Models
- Authors: Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao
- Abstract summary: We propose a novel approach to harnessing structured knowledge in large language models (LLMs)
We introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning.
In addition, by incorporating high-level and global-level prompts, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships.
- Score: 43.56153167864033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt learning has become a prevalent strategy for adapting vision-language
foundation models to downstream tasks. As large language models (LLMs) have
emerged, recent studies have explored the use of category-related descriptions
as input to enhance prompt effectiveness. Nevertheless, conventional
descriptions fall short of structured information that effectively represents
the interconnections among entities or attributes linked to a particular
category. To address this limitation and prioritize harnessing structured
knowledge, this paper advocates for leveraging LLMs to build a graph for each
description to model the entities and attributes describing the category, as
well as their correlations. Preexisting prompt tuning methods exhibit
inadequacies in managing this structured knowledge. Consequently, we propose a
novel approach called Hierarchical Prompt Tuning (HPT), which enables
simultaneous modeling of both structured and conventional linguistic knowledge.
Specifically, we introduce a relationship-guided attention module to capture
pair-wise associations among entities and attributes for low-level prompt
learning. In addition, by incorporating high-level and global-level prompts
modeling overall semantics, the proposed hierarchical structure forges
cross-level interlinks and empowers the model to handle more complex and
long-term relationships. Extensive experiments demonstrate that our HPT shows
strong effectiveness and generalizes much better than existing SOTA methods.
Our code is available at https://github.com/Vill-Lab/2024-AAAI-HPT.
Related papers
- MGSA: Multi-Granularity Graph Structure Attention for Knowledge Graph-to-Text Generation [10.607080796475815]
This paper introduces the Multi-granularity Graph Structure Attention (MGSA), which is based on pre-trained language models (PLMs)
The encoder of the model architecture features an entity-level structure encoding module, a word-level structure encoding module, and an aggregation module that synthesizes information from both structure.
We conducted extensive evaluations of the MGSA model using two widely recognized KG-to-Text Generation benchmark datasets, WebNLG and EventNarrative.
arXiv Detail & Related papers (2024-09-16T14:01:03Z) - HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling [39.14392943549792]
We propose a novel approach called Hierarchical Prompt Tuning (HPT), enabling simultaneous modeling of both structured and conventional linguistic knowledge.
We introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning.
By incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships.
arXiv Detail & Related papers (2024-08-27T06:50:28Z) - Emergent Visual-Semantic Hierarchies in Image-Text Representations [13.300199242824934]
We study the knowledge of existing foundation models, finding that they exhibit emergent understanding of visual-semantic hierarchies.
We propose the Radial Embedding (RE) framework for probing and optimizing hierarchical understanding.
arXiv Detail & Related papers (2024-07-11T14:09:42Z) - Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations.
We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations.
A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z) - KGLM: Integrating Knowledge Graph Structure in Language Models for Link
Prediction [0.0]
We introduce a new entity/relation embedding layer that learns to differentiate distinctive entity and relation types.
We show that further pre-training the language models with this additional embedding layer using the triples extracted from the knowledge graph, followed by the standard fine-tuning phase sets a new state-of-the-art performance for the link prediction task on the benchmark datasets.
arXiv Detail & Related papers (2022-11-04T20:38:12Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.