GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
- URL: http://arxiv.org/abs/2504.00711v1
- Date: Tue, 01 Apr 2025 12:21:50 GMT
- Title: GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
- Authors: Enjun Du, Xunkai Li, Tian Jin, Zhihan Zhang, Rong-Hua Li, Guoren Wang,
- Abstract summary: GraphMaster is the first multi-agent framework specifically designed for graph data synthesis in data-limited environments.<n>We develop new data-limited "Sub" variants of six standard graph benchmarks, specifically designed to test synthesis capabilities under realistic constraints.<n>We also develop a novel interpretability assessment framework that combines human evaluation with a principled Grassmannian manifold-based analysis.
- Score: 32.916371346197835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The era of foundation models has revolutionized AI research, yet Graph Foundation Models (GFMs) remain constrained by the scarcity of large-scale graph corpora. Traditional graph data synthesis techniques primarily focus on simplistic structural operations, lacking the capacity to generate semantically rich nodes with meaningful textual attributes: a critical limitation for real-world applications. While large language models (LLMs) demonstrate exceptional text generation capabilities, their direct application to graph synthesis is impeded by context window limitations, hallucination phenomena, and structural consistency challenges. To address these issues, we introduce GraphMaster, the first multi-agent framework specifically designed for graph data synthesis in data-limited environments. GraphMaster orchestrates four specialized LLM agents (Manager, Perception, Enhancement, and Evaluation) that collaboratively optimize the synthesis process through iterative refinement, ensuring both semantic coherence and structural integrity. To rigorously evaluate our approach, we create new data-limited "Sub" variants of six standard graph benchmarks, specifically designed to test synthesis capabilities under realistic constraints. Additionally, we develop a novel interpretability assessment framework that combines human evaluation with a principled Grassmannian manifold-based analysis, providing both qualitative and quantitative measures of semantic coherence. Experimental results demonstrate that GraphMaster significantly outperforms traditional synthesis methods across multiple datasets, establishing a strong foundation for advancing GFMs in data-scarce environments.
Related papers
- GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks [15.147178364098034]
We present Graph Omni, a benchmark framework for evaluating graph reasoning capabilities of LLMs.
Our findings emphasize that no single serialization or prompting strategy consistently outperforms others.
Motivated by these insights, we propose a reinforcement learning-based approach that dynamically selects the best serialization-prompt pairings.
arXiv Detail & Related papers (2025-04-17T09:01:16Z) - Graph Masked Language Models [0.0]
Language Models (LMs) and Graph Neural Networks (GNNs) have shown great promise in their respective areas.<n>We propose emphGraph Masked Language Models (GMLM), a novel dual-branch architecture that combines the structural learning of GNNs with the contextual power of pretrained language models.
arXiv Detail & Related papers (2025-02-24T07:44:01Z) - Revisiting Graph Neural Networks on Graph-level Tasks: Comprehensive Experiments, Analysis, and Improvements [54.006506479865344]
We propose a unified evaluation framework for graph-level Graph Neural Networks (GNNs)<n>This framework provides a standardized setting to evaluate GNNs across diverse datasets.<n>We also propose a novel GNN model with enhanced expressivity and generalization capabilities.
arXiv Detail & Related papers (2025-01-01T08:48:53Z) - Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks [25.720233631885726]
integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) has emerged as a promising technological paradigm.<n>We leverage graph description texts with rich semantic context to fundamentally enhance Data quality.<n>This work serves as a foundational reference for researchers and practitioners looking to advance graph learning methodologies.
arXiv Detail & Related papers (2024-12-17T01:41:17Z) - LLM-Based Multi-Agent Systems are Scalable Graph Generative Models [73.28294528654885]
GraphAgent-Generator (GAG) is a novel simulation-based framework for dynamic, text-attributed social graph generation.<n>GAG simulates the temporal node and edge generation processes for zero-shot social graph generation.<n>The resulting graphs exhibit adherence to seven key macroscopic network properties, achieving an 11% improvement in microscopic graph structure metrics.
arXiv Detail & Related papers (2024-10-13T12:57:08Z) - How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension [53.6373473053431]
This work introduces a benchmark to assess large language models' capabilities in graph pattern tasks.
We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions.
Our benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models.
arXiv Detail & Related papers (2024-10-04T04:48:33Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.
We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.
Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation [31.89877722246351]
We introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation.
SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules.
It provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation.
arXiv Detail & Related papers (2024-01-07T04:43:36Z) - The Devil in the Details: Simple and Effective Optical Flow Synthetic
Data Generation [19.945859289278534]
We show that the required characteristics in an optical flow dataset are rather simple and present a simpler synthetic data generation method.
With 2D motion-based datasets, we systematically analyze the simplest yet critical factors for generating synthetic datasets.
arXiv Detail & Related papers (2023-08-14T18:01:45Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.