Data-centric Federated Graph Learning with Large Language Models
- URL: http://arxiv.org/abs/2503.19455v1
- Date: Tue, 25 Mar 2025 08:43:08 GMT
- Title: Data-centric Federated Graph Learning with Large Language Models
- Authors: Bo Yan, Zhongjian Zhang, Huabin Sun, Mengmei Zhang, Yang Cao, Chuan Shi,
- Abstract summary: In federated graph learning (FGL), a complete graph is divided into multiple subgraphs stored in each client due to privacy concerns.<n>A pain point of FGL is the heterogeneity problem, where nodes or structures present non-IID properties among clients.<n>We propose a general framework that innovatively decomposes the task of large language models for FGL into two sub-tasks theoretically.
- Score: 34.224475952206404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In federated graph learning (FGL), a complete graph is divided into multiple subgraphs stored in each client due to privacy concerns, and all clients jointly train a global graph model by only transmitting model parameters. A pain point of FGL is the heterogeneity problem, where nodes or structures present non-IID properties among clients (e.g., different node label distributions), dramatically undermining the convergence and performance of FGL. To address this, existing efforts focus on design strategies at the model level, i.e., they design models to extract common knowledge to mitigate heterogeneity. However, these model-level strategies fail to fundamentally address the heterogeneity problem as the model needs to be designed from scratch when transferring to other tasks. Motivated by large language models (LLMs) having achieved remarkable success, we aim to utilize LLMs to fully understand and augment local text-attributed graphs, to address data heterogeneity at the data level. In this paper, we propose a general framework LLM4FGL that innovatively decomposes the task of LLM for FGL into two sub-tasks theoretically. Specifically, for each client, it first utilizes the LLM to generate missing neighbors and then infers connections between generated nodes and raw nodes. To improve the quality of generated nodes, we design a novel federated generation-and-reflection mechanism for LLMs, without the need to modify the parameters of the LLM but relying solely on the collective feedback from all clients. After neighbor generation, all the clients utilize a pre-trained edge predictor to infer the missing edges. Furthermore, our framework can seamlessly integrate as a plug-in with existing FGL methods. Experiments on three real-world datasets demonstrate the superiority of our method compared to advanced baselines.
Related papers
- FedHERO: A Federated Learning Approach for Node Classification Task on Heterophilic Graphs [55.51300642911766]
Federated Graph Learning (FGL) empowers clients to collaboratively train Graph neural networks (GNNs) in a distributed manner.
FGL methods usually require that the graph data owned by all clients is homophilic to ensure similar neighbor distribution patterns of nodes.
We propose FedHERO, an FGL framework designed to harness and share insights from heterophilic graphs effectively.
arXiv Detail & Related papers (2025-04-29T22:23:35Z) - Federated Prototype Graph Learning [33.38948169766356]
Federated Graph Learning (FGL) has gained significant attention for its distributed training capabilities.
FEMAIL: We propose FedPG as a general prototype-guided optimization method for the above multi-level FGL heterogeneity.
Experiments demonstrate that FedPG outperforms SOTA baselines by an average of 3.57% in accuracy while reducing communication costs by 168x.
arXiv Detail & Related papers (2025-04-13T09:21:21Z) - GL-Fusion: Rethinking the Combination of Graph Neural Network and Large Language model [63.774726052837266]
We introduce a new architecture that deeply integrates Graph Neural Networks (GNNs) with Large Language Models (LLMs)<n>We introduce three key innovations: (1) Structure-Aware Transformers, which incorporate GNN's message-passing capabilities directly into LLM's transformer layers; (2) Graph-Text Cross-Attention, which processes full, uncompressed text from graph nodes and edges; and (3) GNN-LLM Twin Predictor, enabling LLM's flexible autoregressive generation alongside GNN's scalable one-pass prediction.
arXiv Detail & Related papers (2024-12-08T05:49:58Z) - How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Exploring the Potential of Large Language Models for Heterophilic Graphs [38.79574338268997]
We propose a two-stage framework for modeling heterophilic graphs using large language models (LLMs)<n>In the first stage, we fine-tune the LLM to better identify homophilic and heterophilic edges based on the textual content of their nodes.<n>In the second stage, we adaptively manage message propagation in GNNs for different edge types based on node features, structures, and heterophilic or homophilic characteristics.
arXiv Detail & Related papers (2024-08-26T09:29:56Z) - Federated Graph Learning with Structure Proxy Alignment [43.13100155569234]
Federated Graph Learning (FGL) aims to learn graph learning models over graph data distributed in multiple data owners.
We propose FedSpray, a novel FGL framework that learns local class-wise structure proxies in the latent space.
Our goal is to obtain the aligned structure proxies that can serve as reliable, unbiased neighboring information for node classification.
arXiv Detail & Related papers (2024-08-18T07:32:54Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - ZeroG: Investigating Cross-dataset Zero-shot Transferability in Graphs [36.749959232724514]
ZeroG is a new framework tailored to enable cross-dataset generalization.
We address the inherent challenges such as feature misalignment, mismatched label spaces, and negative transfer.
We propose a prompt-based subgraph sampling module that enriches the semantic information and structure information of extracted subgraphs.
arXiv Detail & Related papers (2024-02-17T09:52:43Z) - AdaFGL: A New Paradigm for Federated Node Classification with Topology
Heterogeneity [44.11777886421429]
Federated Graph Learning (FGL) has attracted significant attention as a distributed framework based on graph neural networks.
We introduce the concept of structure Non-iid split and then present a new paradigm called underlineAdaptive underlineFederated underlineGraph underlineLearning (AdaFGL)
Our proposed AdaFGL outperforms baselines by significant margins of 3.24% and 5.57% on community split and structure Non-iid split, respectively.
arXiv Detail & Related papers (2024-01-22T08:23:31Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Towards Instance-adaptive Inference for Federated Learning [80.38701896056828]
Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training.
In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework.
Our experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64% improvement against the top-performing method with less than 15% communication cost on Tiny-ImageNet.
arXiv Detail & Related papers (2023-08-11T09:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.