Related papers: Can LLMs Convert Graphs to Text-Attributed Graphs?

Can LLMs Convert Graphs to Text-Attributed Graphs?

URL: http://arxiv.org/abs/2412.10136v2
Date: Fri, 07 Feb 2025 01:29:07 GMT
Title: Can LLMs Convert Graphs to Text-Attributed Graphs?
Authors: Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi Ma, Chuxu Zhang, Yanfang Ye,
Abstract summary: We propose Topology-Aware Node description Synthesis (TANS) to convert existing graphs into text-attributed graphs.<n>We evaluate our TANS on text-rich, text-limited, and text-free graphs, demonstrating its applicability.
Score: 35.53046810556242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graphs are ubiquitous structures found in numerous real-world applications, such as drug discovery, recommender systems, and social network analysis. To model graph-structured data, graph neural networks (GNNs) have become a popular tool. However, existing GNN architectures encounter challenges in cross-graph learning where multiple graphs have different feature spaces. To address this, recent approaches introduce text-attributed graphs (TAGs), where each node is associated with a textual description, which can be projected into a unified feature space using textual encoders. While promising, this method relies heavily on the availability of text-attributed graph data, which is difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), leveraging large language models (LLMs) to convert existing graphs into text-attributed graphs. The key idea is to integrate topological information into LLMs to explain how graph topology influences node semantics. We evaluate our TANS on text-rich, text-limited, and text-free graphs, demonstrating its applicability. Notably, on text-free graphs, our method significantly outperforms existing approaches that manually design node features, showcasing the potential of LLMs for preprocessing graph-structured data in the absence of textual information. The code and data are available at https://github.com/Zehong-Wang/TANS.

Related papers

LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models [54.82915844507371]
Text-Attributed Graphs (TAGs) are ubiquitous in real-world scenarios. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures. We propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning.
arXiv Detail & Related papers (2025-03-05T09:45:22Z)
Query-Aware Learnable Graph Pooling Tokens as Prompt for Large Language Models [3.9489815622117566]
Learnable Graph Pooling Token (LGPT) enables flexible and efficient graph representation. Our method achieves a 4.13% performance improvement on the GraphQA benchmark without training the large language model.
arXiv Detail & Related papers (2025-01-29T10:35:41Z)
Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning [45.70767623846523]
We propose a novel unified unsupervised learning autoencoder framework, named Node Level Graph AutoEncoder (NodeGAE) We employ language models as the backbone of the autoencoder, with pretraining on text reconstruction. Our method maintains simplicity in the training process and demonstrates generalizability across diverse textual graphs and downstream tasks.
arXiv Detail & Related papers (2024-08-09T14:57:53Z)
GAugLLM: Improving Graph Contrastive Learning for Text-Attributed Graphs with Large Language Models [33.3678293782131]
This work studies self-supervised graph learning for text-attributed graphs (TAGs) We aim to improve view generation through language supervision. This is driven by the prevalence of textual attributes in real applications, which complement graph structures with rich semantic information.
arXiv Detail & Related papers (2024-06-17T17:49:19Z)
Hierarchical Compression of Text-Rich Graphs via Large Language Models [63.75293588479027]
Text-rich graphs are prevalent in data mining contexts like e-commerce and academic graphs. This paper introduces Hierarchical Compression'' (HiCom), a novel method to align the capabilities of LLMs with the structure of text-rich graphs. HiCom can outperform both GNNs and LLM backbones for node classification on e-commerce and citation graphs.
arXiv Detail & Related papers (2024-06-13T07:24:46Z)
Parameter-Efficient Tuning Large Language Models for Graph Representation Learning [62.26278815157628]
We introduce Graph-aware. Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning. We use a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt. We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations.
arXiv Detail & Related papers (2024-04-28T18:36:59Z)
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora. We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z)
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs. We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs. G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z)
Large Language Models on Graphs: A Comprehensive Survey [77.16803297418201]
We provide a systematic review of scenarios and techniques related to large language models on graphs. We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs. We discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets.
arXiv Detail & Related papers (2023-12-05T14:14:27Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG) Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP. Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z)
Learning on Attribute-Missing Graphs [66.76561524848304]
There is a graph where attributes of only partial nodes could be available and those of the others might be entirely missing. Existing graph learning methods including the popular GNN cannot provide satisfied learning performance. We develop a novel distribution matching based GNN called structure-attribute transformer (SAT) for attribute-missing graphs.
arXiv Detail & Related papers (2020-11-03T11:09:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.