Bridging Molecular Graphs and Large Language Models
- URL: http://arxiv.org/abs/2503.03135v2
- Date: Mon, 10 Mar 2025 09:51:05 GMT
- Title: Bridging Molecular Graphs and Large Language Models
- Authors: Runze Wang, Mingqi Yang, Yanming Shen,
- Abstract summary: Large Language Models (LLMs) have shown exceptional generalization capabilities, but their ability to process graph data, such as molecular structures, remains limited.<n>This paper proposes Graph2Token, an efficient solution that aligns graph tokens to LLM tokens.<n>Extensive experiments on molecular classification and regression tasks demonstrate the effectiveness of our proposed Graph2Token.
- Score: 10.647911401603801
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While Large Language Models (LLMs) have shown exceptional generalization capabilities, their ability to process graph data, such as molecular structures, remains limited. To bridge this gap, this paper proposes Graph2Token, an efficient solution that aligns graph tokens to LLM tokens. The key idea is to represent a graph token with the LLM token vocabulary, without fine-tuning the LLM backbone. To achieve this goal, we first construct a molecule-text paired dataset from multisources, including CHEBI and HMDB, to train a graph structure encoder, which reduces the distance between graphs and texts representations in the feature space. Then, we propose a novel alignment strategy that associates a graph token with LLM tokens. To further unleash the potential of LLMs, we collect molecular IUPAC name identifiers, which are incorporated into the LLM prompts. By aligning molecular graphs as special tokens, we can activate LLM generalization ability to molecular few-shot learning. Extensive experiments on molecular classification and regression tasks demonstrate the effectiveness of our proposed Graph2Token.
Related papers
- LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models [54.82915844507371]
Text-Attributed Graphs (TAGs) are ubiquitous in real-world scenarios.
Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures.
We propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning.
arXiv Detail & Related papers (2025-03-05T09:45:22Z) - Each Graph is a New Language: Graph Learning with LLMs [9.22463167477865]
We present textbfGraph-textbfDefined textbfLanguage for textbfLarge textbfLanguage textbfModel (GDL4LLM) to transfer powerful language understanding capabilities to graph-structured data.<n>GDL4LLM translates graphs into a graph language corpus instead of graph descriptions and pre-trains LLMs on this corpus to adequately understand graph structures.
arXiv Detail & Related papers (2025-01-20T13:20:41Z) - Enhance Graph Alignment for Large Language Models [33.96082485852042]
Graph-to-token approaches are popular in enabling Large Language Models to process graph information.
Existing methods have a misalignment between self-supervised tasks and supervised downstream tasks.
We propose Graph Alignment Large Language Models (GALLM) to benefit from aligned task templates.
arXiv Detail & Related papers (2024-10-15T07:50:34Z) - LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings [7.302176015732192]
We introduce a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM)<n>We pretrain a GNN, aligning its representations with token embeddings of an LLM.<n>We then train a linear projector that transforms the GNN's representations into a fixed number of graph token embeddings.
arXiv Detail & Related papers (2024-08-25T04:32:45Z) - HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment [41.75926736949724]
We propose a novel strategy called HIerarchical GrapH Tokenization (HIGHT) to improve the graph perception of large language models (LLMs)
HIGHT employs a hierarchical graph tokenizer that extracts and encodes the hierarchy of node, motif, and graph levels of informative tokens to improve the graph perception of LLMs.
Experiments on 7 molecule-centric benchmarks confirm the effectiveness of HIGHT in reducing hallucination by 40%, as well as significant improvements in various molecule-language downstream tasks.
arXiv Detail & Related papers (2024-06-20T06:37:35Z) - Parameter-Efficient Tuning Large Language Models for Graph Representation Learning [62.26278815157628]
We introduce Graph-aware.
Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning.
We use a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt.
We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations.
arXiv Detail & Related papers (2024-04-28T18:36:59Z) - Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks.
Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora.
We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z) - Large Language Models on Graphs: A Comprehensive Survey [77.16803297418201]
We provide a systematic review of scenarios and techniques related to large language models on graphs.
We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs.
We discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets.
arXiv Detail & Related papers (2023-12-05T14:14:27Z) - Integrating Graphs with Large Language Models: Methods and Prospects [68.37584693537555]
Large language models (LLMs) have emerged as frontrunners, showcasing unparalleled prowess in diverse applications.
Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest.
This paper bifurcates such integrations into two predominant categories.
arXiv Detail & Related papers (2023-10-09T07:59:34Z) - Exploring the Potential of Large Language Models (LLMs) in Learning on
Graphs [59.74814230246034]
Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities.
We investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors.
arXiv Detail & Related papers (2023-07-07T05:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.