Related papers: UniGLM: Training One Unified Language Model for Text-Attributed Graphs

UniGLM: Training One Unified Language Model for Text-Attributed Graphs

URL: http://arxiv.org/abs/2406.12052v1
Date: Mon, 17 Jun 2024 19:45:21 GMT
Title: UniGLM: Training One Unified Language Model for Text-Attributed Graphs
Authors: Yi Fang, Dongzhe Fan, Sirui Ding, Ninghao Liu, Qiaoyu Tan,
Abstract summary: Unified Graph Language Model (UniGLM) is a graph embedding model that generalizes well to both in-domain and cross-domain TAGs. UniGLM includes an adaptive positive sample selection technique for identifying structurally similar nodes and a lazy contrastive module that is devised to accelerate training.
Score: 31.464021556351685
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Representation learning on text-attributed graphs (TAGs), where nodes are represented by textual descriptions, is crucial for textual and relational knowledge systems and recommendation systems. Currently, state-of-the-art embedding methods for TAGs primarily focus on fine-tuning language models (e.g., BERT) using structure-aware training signals. While effective, these methods are tailored for individual TAG and cannot generalize across various graph scenarios. Given the shared textual space, leveraging multiple TAGs for joint fine-tuning, aligning text and graph structure from different aspects, would be more beneficial. Motivated by this, we introduce a novel Unified Graph Language Model (UniGLM) framework, the first graph embedding model that generalizes well to both in-domain and cross-domain TAGs. Specifically, UniGLM is trained over multiple TAGs with different domains and scales using self-supervised contrastive learning. UniGLM includes an adaptive positive sample selection technique for identifying structurally similar nodes and a lazy contrastive module that is devised to accelerate training by minimizing repetitive encoding calculations. Extensive empirical results across 9 benchmark TAGs demonstrate UniGLM's efficacy against leading embedding baselines in terms of generalization (various downstream tasks and backbones) and transfer learning (in and out of domain scenarios). The code is available at https://github.com/NYUSHCS/UniGLM.

Related papers

Efficient Text-Attributed Graph Learning through Selective Annotation and Graph Alignment [24.0890725396281]
We introduce GAGA, an efficient framework for TAG representation learning.<n>It reduces annotation time and cost by focusing on annotating only representative nodes and edges.<n>Experiments show that GAGA classification achieves accuracies on par with or surpassing state-of-the-art methods.
arXiv Detail & Related papers (2025-06-08T14:34:29Z)
LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models [54.82915844507371]
Text-Attributed Graphs (TAGs) are ubiquitous in real-world scenarios. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures. We propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning.
arXiv Detail & Related papers (2025-03-05T09:45:22Z)
Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? [62.12375949429938]
We propose a multi-modal prompt learning paradigm to adapt pre-trained Graph Neural Networks to downstream tasks and data.<n>Our new paradigm embeds the graphs directly in the same space as the Large Language Models (LLMs) by learning both graph prompts and text prompts simultaneously.<n>We build the first CLIP-style zero-shot classification prototype that can generalize GNNs to unseen classes with extremely weak text supervision.
arXiv Detail & Related papers (2024-12-11T08:03:35Z)
Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information. Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z)
DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs [28.340416573162898]
Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs. We introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs.
arXiv Detail & Related papers (2024-06-17T20:16:12Z)
TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations [15.873944819608434]
Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions. This paper introduces a new self-supervised learning framework, Text-And-Graph Multi-View Alignment (TAGA), which integrates TAGs' structural and semantic dimensions. Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.
arXiv Detail & Related papers (2024-05-27T03:40:16Z)
Pretraining Language Models with Text-Attributed Heterogeneous Graphs [28.579509154284448]
We present a new pretraining framework for Language Models (LMs) that explicitly considers the topological and heterogeneous information in Text-Attributed Heterogeneous Graphs (TAHGs) We propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network. We conduct link prediction and node classification tasks on three datasets from various domains.
arXiv Detail & Related papers (2023-10-19T08:41:21Z)
Learning Multiplex Representations on Text-Attributed Graphs with One Language Model Encoder [55.24276913049635]
We propose METAG, a new framework for learning Multiplex rEpresentations on Text-Attributed Graphs. In contrast to existing methods, METAG uses one text encoder to model the shared knowledge across relations. We conduct experiments on nine downstream tasks in five graphs from both academic and e-commerce domains.
arXiv Detail & Related papers (2023-10-10T14:59:22Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG) Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP. Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z)
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts. Data is thus a bottleneck for training effective sign language translation models. This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z)
Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text. Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z)
GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph [53.70520466556453]
We propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models. With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow. In addition, a progressive learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph.
arXiv Detail & Related papers (2021-05-06T12:20:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.