Leveraging Large Language Models for Effective Label-free Node Classification in Text-Attributed Graphs
- URL: http://arxiv.org/abs/2412.11983v2
- Date: Mon, 28 Apr 2025 12:17:25 GMT
- Title: Leveraging Large Language Models for Effective Label-free Node Classification in Text-Attributed Graphs
- Authors: Taiyan Zhang, Renchi Yang, Yurui Lai, Mingyu Yan, Xiaochun Ye, Dongrui Fan,
- Abstract summary: Locle is an active self-training framework that does Label-free node Classification with LLMs cost-Effectively.<n>It iteratively identifies small sets of "critical" samples using GNNs and extracts informative pseudo-labels for them with both LLMs and GNNs.<n>It significantly outperforms state-of-the-art methods under the same query budget to LLMs in terms of label-free node classification.
- Score: 10.538099379851198
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Graph neural networks (GNNs) have become the preferred models for node classification in graph data due to their robust capabilities in integrating graph structures and attributes. However, these models heavily depend on a substantial amount of high-quality labeled data for training, which is often costly to obtain. With the rise of large language models (LLMs), a promising approach is to utilize their exceptional zero-shot capabilities and extensive knowledge for node labeling. Despite encouraging results, this approach either requires numerous queries to LLMs or suffers from reduced performance due to noisy labels generated by LLMs. To address these challenges, we introduce Locle, an active self-training framework that does Label-free node Classification with LLMs cost-Effectively. Locle iteratively identifies small sets of "critical" samples using GNNs and extracts informative pseudo-labels for them with both LLMs and GNNs, serving as additional supervision signals to enhance model training. Specifically, Locle comprises three key components: (i) an effective active node selection strategy for initial annotations; (ii) a careful sample selection scheme to identify "critical" nodes based on label disharmonicity and entropy; and (iii) a label refinement module that combines LLMs and GNNs with a rewired topology. Extensive experiments on five benchmark text-attributed graph datasets demonstrate that Locle significantly outperforms state-of-the-art methods under the same query budget to LLMs in terms of label-free node classification. Notably, on the DBLP dataset with 14.3k nodes, Locle achieves an 8.08% improvement in accuracy over the state-of-the-art at a cost of less than one cent. Our code is available at https://github.com/HKBU-LAGAS/Locle.
Related papers
- Few-Shot Graph Out-of-Distribution Detection with LLMs [34.42512005781724]
We propose a framework that combines the strengths of large language models (LLMs) and graph neural networks (GNNs) to enhance data efficiency in graph out-of-distribution (OOD) detection.
We show that LLM-GOOD significantly reduces human annotation costs and outperforms state-of-the-art baselines in terms of both ID classification accuracy and OOD detection performance.
arXiv Detail & Related papers (2025-03-28T02:37:18Z) - How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks [51.19110891434727]
Large Language Models (LLMs) with pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data.
E-LLaGNN is a framework with an on-demand LLM service that enriches message passing procedure of graph learning by enhancing a limited fraction of nodes from the graph.
arXiv Detail & Related papers (2024-07-20T22:09:42Z) - Similarity-based Neighbor Selection for Graph LLMs [43.176381523196426]
We introduce Similarity-based Neighbor Selection (SNS)
SNS improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily.
As an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods.
arXiv Detail & Related papers (2024-02-06T05:29:05Z) - Label-free Node Classification on Graphs with Large Language Models
(LLMS) [46.937442239949256]
This work introduces a label-free node classification on graphs with Large Language Models pipeline, LLM-GNN.
Itates the strengths of both GNNs and LLMs while mitigating their limitations.
In particular, LLM-GNN can achieve an accuracy of 74.9% on a vast-scale dataset with a cost less than 1 dollar.
arXiv Detail & Related papers (2023-10-07T03:14:11Z) - Balancing Efficiency vs. Effectiveness and Providing Missing Label
Robustness in Multi-Label Stream Classification [3.97048491084787]
We propose a neural network-based approach to high-dimensional multi-label classification.
Our model uses a selective concept drift adaptation mechanism that makes it suitable for a non-stationary environment.
We adapt our model to an environment with missing labels using a simple yet effective imputation strategy.
arXiv Detail & Related papers (2023-10-01T13:23:37Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Label-Enhanced Graph Neural Network for Semi-supervised Node
Classification [32.64730237473914]
We present a label-enhanced learning framework for Graph Neural Networks (GNNs)
It first models each label as a virtual center for intra-class nodes and then jointly learns the representations of both nodes and labels.
Our approach could not only smooth the representations of nodes belonging to the same class, but also explicitly encode the label semantics into the learning process of GNNs.
arXiv Detail & Related papers (2022-05-31T09:48:47Z) - Active Learning for Node Classification: The Additional Learning Ability
from Unlabelled Nodes [33.97571297149204]
Given a limited labelling budget, active learning aims to improve performance by carefully choosing which nodes to label.
Our empirical study shows that existing active learning methods for node classification are considerably outperformed by a simple method.
We propose a novel latent space clustering-based active learning method for node classification (LSCALE)
arXiv Detail & Related papers (2020-12-13T13:59:48Z) - Delving Deep into Label Smoothing [112.24527926373084]
Label smoothing is an effective regularization tool for deep neural networks (DNNs)
We present an Online Label Smoothing (OLS) strategy, which generates soft labels based on the statistics of the model prediction for the target category.
arXiv Detail & Related papers (2020-11-25T08:03:11Z) - Cyclic Label Propagation for Graph Semi-supervised Learning [52.102251202186025]
We introduce a novel framework for graph semi-supervised learning called CycProp.
CycProp integrates GNNs into the process of label propagation in a cyclic and mutually reinforcing manner.
In particular, our proposed CycProp updates the node embeddings learned by GNN module with the augmented information by label propagation.
arXiv Detail & Related papers (2020-11-24T02:55:40Z) - PseudoSeg: Designing Pseudo Labels for Semantic Segmentation [78.35515004654553]
We present a re-design of pseudo-labeling to generate structured pseudo labels for training with unlabeled or weakly-labeled data.
We demonstrate the effectiveness of the proposed pseudo-labeling strategy in both low-data and high-data regimes.
arXiv Detail & Related papers (2020-10-19T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.