Related papers: GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

URL: http://arxiv.org/abs/2309.13625v1
Date: Sun, 24 Sep 2023 12:56:40 GMT
Title: GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Authors: Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen, and Xinchao Wang
Abstract summary: adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) We propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively.
Score: 63.81641578763094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning of vision-language models (VLMs) under the low-data regime, where only a few additional parameters are introduced to excavate the task-specific knowledge based on the general and powerful representation of VLMs. However, most adapter-style works face two limitations: (i) modeling task-specific knowledge with a single modality only; and (ii) overlooking the exploitation of the inter-class relationships in downstream tasks, thereby leading to sub-optimal solutions. To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i.e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph. In particular, the dual knowledge graph is established with two sub-graphs, i.e., a textual knowledge sub-graph, and a visual knowledge sub-graph, where the nodes and edges represent the semantics/classes and their correlations in two modalities, respectively. This enables the textual feature of each prompt to leverage the task-specific structure knowledge from both textual and visual modalities, yielding a more effective classifier for downstream tasks. Extensive experimental results on 11 benchmark datasets reveal that our GraphAdapter significantly outperforms previous adapter-based methods. The code will be released at https://github.com/lixinustc/GraphAdapter

Related papers

Beyond Graph Model: Reliable VLM Fine-Tuning via Random Graph Adapter [13.042863991752883]
Existing works generally employ the deterministic textual feature adapter to refine each category.<n>We propose to exploit random graph model into VLM adapter and develop a novel Vertex Random Graph Adapter (VRGAdapter)<n>VRGAdapter first models the inherent diverse descriptions of each category and inter-class relationships of different categories simultaneously.
arXiv Detail & Related papers (2025-07-14T14:56:49Z)
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [75.9865035064794]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z)
Task-Adapter++: Task-specific Adaptation with Order-aware Alignment for Few-shot Action Recognition [33.22316608406554]
We propose a parameter-efficient dual adaptation method for both image and text encoders.<n>Specifically, we design a task-specific adaptation for the image encoder so that the most discriminative information can be well noticed during feature extraction.<n>We develop an innovative fine-grained cross-modal alignment strategy that actively maps visual features to reside in the same temporal stage as semantic descriptions.
arXiv Detail & Related papers (2025-05-09T12:34:10Z)
LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models [54.82915844507371]
Text-Attributed Graphs (TAGs) are ubiquitous in real-world scenarios. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures. We propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning.
arXiv Detail & Related papers (2025-03-05T09:45:22Z)
HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter [19.557300178619382]
We propose a novel Heterogeneous Graph Adapter to achieve tuning VLMs for the downstream tasks. We employ a specific Heterogeneous Graph Neural Network to excavate multi-modality structure knowledge for the downstream tasks. Experimental results on 11 benchmark datasets demonstrate the effectiveness and benefits of the proposed HeGraphAdapter.
arXiv Detail & Related papers (2024-10-10T12:20:58Z)
Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs) We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning [45.70767623846523]
We propose a novel unified unsupervised learning autoencoder framework, named Node Level Graph AutoEncoder (NodeGAE) We employ language models as the backbone of the autoencoder, with pretraining on text reconstruction. Our method maintains simplicity in the training process and demonstrates generalizability across diverse textual graphs and downstream tasks.
arXiv Detail & Related papers (2024-08-09T14:57:53Z)
GraphGPT: Graph Instruction Tuning for Large Language Models [27.036935149004726]
Graph Neural Networks (GNNs) have evolved to understand graph structures. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation. Our research tackles this by advancing graph model generalization in zero-shot learning environments.
arXiv Detail & Related papers (2023-10-19T06:17:46Z)
SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning. We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task. We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z)
Subgraph Networks Based Contrastive Learning [5.736011243152416]
Graph contrastive learning (GCL) can solve the problem of annotated data scarcity. Most existing GCL methods focus on the design of graph augmentation strategies and mutual information estimation operations. We propose a novel framework called subgraph network-based contrastive learning (SGNCL)
arXiv Detail & Related papers (2023-06-06T08:52:44Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
CLIP-Adapter: Better Vision-Language Models with Feature Adapters [79.52844563138493]
We show that there is an alternative path to achieve better vision-language models other than prompt tuning. In this paper, we propose CLIP-Adapter to conduct fine-tuning with feature adapters on either visual or language branch. Experiments and extensive ablation studies on various visual classification tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-10-09T11:39:30Z)
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme. In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.