GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
- URL: http://arxiv.org/abs/2309.13625v1
- Date: Sun, 24 Sep 2023 12:56:40 GMT
- Title: GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
- Authors: Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen, and Xinchao
Wang
- Abstract summary: adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs)
We propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge.
In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively.
- Score: 63.81641578763094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adapter-style efficient transfer learning (ETL) has shown excellent
performance in the tuning of vision-language models (VLMs) under the low-data
regime, where only a few additional parameters are introduced to excavate the
task-specific knowledge based on the general and powerful representation of
VLMs. However, most adapter-style works face two limitations: (i) modeling
task-specific knowledge with a single modality only; and (ii) overlooking the
exploitation of the inter-class relationships in downstream tasks, thereby
leading to sub-optimal solutions. To mitigate that, we propose an effective
adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual
adapter by explicitly modeling the dual-modality structure knowledge (i.e., the
correlation of different semantics/classes in textual and visual modalities)
with a dual knowledge graph. In particular, the dual knowledge graph is
established with two sub-graphs, i.e., a textual knowledge sub-graph, and a
visual knowledge sub-graph, where the nodes and edges represent the
semantics/classes and their correlations in two modalities, respectively. This
enables the textual feature of each prompt to leverage the task-specific
structure knowledge from both textual and visual modalities, yielding a more
effective classifier for downstream tasks. Extensive experimental results on 11
benchmark datasets reveal that our GraphAdapter significantly outperforms
previous adapter-based methods. The code will be released at
https://github.com/lixinustc/GraphAdapter
Related papers
- HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter [19.557300178619382]
We propose a novel Heterogeneous Graph Adapter to achieve tuning VLMs for the downstream tasks.
We employ a specific Heterogeneous Graph Neural Network to excavate multi-modality structure knowledge for the downstream tasks.
Experimental results on 11 benchmark datasets demonstrate the effectiveness and benefits of the proposed HeGraphAdapter.
arXiv Detail & Related papers (2024-10-10T12:20:58Z) - Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning [45.70767623846523]
We propose a novel unified unsupervised learning autoencoder framework, named Node Level Graph AutoEncoder (NodeGAE)
We employ language models as the backbone of the autoencoder, with pretraining on text reconstruction.
Our method maintains simplicity in the training process and demonstrates generalizability across diverse textual graphs and downstream tasks.
arXiv Detail & Related papers (2024-08-09T14:57:53Z) - GraphGPT: Graph Instruction Tuning for Large Language Models [27.036935149004726]
Graph Neural Networks (GNNs) have evolved to understand graph structures.
To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation.
Our research tackles this by advancing graph model generalization in zero-shot learning environments.
arXiv Detail & Related papers (2023-10-19T06:17:46Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Subgraph Networks Based Contrastive Learning [5.736011243152416]
Graph contrastive learning (GCL) can solve the problem of annotated data scarcity.
Most existing GCL methods focus on the design of graph augmentation strategies and mutual information estimation operations.
We propose a novel framework called subgraph network-based contrastive learning (SGNCL)
arXiv Detail & Related papers (2023-06-06T08:52:44Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - CLIP-Adapter: Better Vision-Language Models with Feature Adapters [79.52844563138493]
We show that there is an alternative path to achieve better vision-language models other than prompt tuning.
In this paper, we propose CLIP-Adapter to conduct fine-tuning with feature adapters on either visual or language branch.
Experiments and extensive ablation studies on various visual classification tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-10-09T11:39:30Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.