GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models
- URL: http://arxiv.org/abs/2406.11945v1
- Date: Mon, 17 Jun 2024 17:49:19 GMT
- Title: GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models
- Authors: Yi Fang, Dongzhe Fan, Daochen Zha, Qiaoyu Tan,
- Abstract summary: This work studies self-supervised graph learning for text-attributed graphs (TAGs)
We aim to improve view generation through language supervision.
This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information.
- Score: 33.3678293782131
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This work studies self-supervised graph learning for text-attributed graphs (TAGs) where nodes are represented by textual attributes. Unlike traditional graph contrastive methods that perturb the numerical feature space and alter the graph's topological structure, we aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information. However, this presents challenges because of two major reasons. First, text attributes often vary in length and quality, making it difficulty to perturb raw text descriptions without altering their original semantic meanings. Second, although text attributes complement graph structures, they are not inherently well-aligned. To bridge the gap, we introduce GAugLLM, a novel framework for augmenting TAGs. It leverages advanced large language models like Mistral to enhance self-supervised graph learning. Specifically, we introduce a mixture-of-prompt-expert technique to generate augmented node features. This approach adaptively maps multiple prompt experts, each of which modifies raw text attributes using prompt engineering, into numerical feature space. Additionally, we devise a collaborative edge modifier to leverage structural and textual commonalities, enhancing edge augmentation by examining or building connections between nodes. Empirical results across five benchmark datasets spanning various domains underscore our framework's ability to enhance the performance of leading contrastive methods as a plug-in tool. Notably, we observe that the augmented features and graph structure can also enhance the performance of standard generative methods, as well as popular graph neural networks. The open-sourced implementation of our GAugLLM is available at Github.
Related papers
- Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning [45.70767623846523]
We propose a novel unified unsupervised learning autoencoder framework, named Node Level Graph AutoEncoder (NodeGAE)
We employ language models as the backbone of the autoencoder, with pretraining on text reconstruction.
Our method maintains simplicity in the training process and demonstrates generalizability across diverse textual graphs and downstream tasks.
arXiv Detail & Related papers (2024-08-09T14:57:53Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information.
Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z) - TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations [15.873944819608434]
Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions.
This paper introduces a new self-supervised learning framework, Text-And-Graph Multi-View Alignment (TAGA), which integrates TAGs' structural and semantic dimensions.
Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.
arXiv Detail & Related papers (2024-05-27T03:40:16Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs.
In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations.
We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z) - Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation [23.54754465832362]
In conventional graph neural networks (GNNs) message passing on a graph is independent from text.
This training regime leads to a semantic gap between graph knowledge and text.
We propose a novel framework for knowledge graph enhanced dialogue generation.
arXiv Detail & Related papers (2023-06-28T13:21:00Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - GraphFormers: GNN-nested Transformers for Representation Learning on
Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models.
With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow.
In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z) - Edge: Enriching Knowledge Graph Embeddings with External Text [32.01476220906261]
We propose a knowledge graph enrichment and embedding framework named Edge.
Given an original knowledge graph, we first generate a rich but noisy augmented graph using external texts in semantic and structural level.
To distill the relevant knowledge and suppress the introduced noise, we design a graph alignment term in a shared embedding space between the original graph and augmented graph.
arXiv Detail & Related papers (2021-04-11T03:47:06Z) - Structure-Augmented Text Representation Learning for Efficient Knowledge
Graph Completion [53.31911669146451]
Human-curated knowledge graphs provide critical supportive information to various natural language processing tasks.
These graphs are usually incomplete, urging auto-completion of them.
graph embedding approaches, e.g., TransE, learn structured knowledge via representing graph elements into dense embeddings.
textual encoding approaches, e.g., KG-BERT, resort to graph triple's text and triple-level contextualized representations.
arXiv Detail & Related papers (2020-04-30T13:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.