Related papers: GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasets

GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasets

URL: http://arxiv.org/abs/2406.16176v2
Date: Mon, 17 Feb 2025 09:53:43 GMT
Title: GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasets
Authors: Qiming Wu, Zichen Chen, Will Corcoran, Misha Sra, Ambuj K. Singh,
Abstract summary: GraphEval36K is the first comprehensive graph dataset, comprising 40 graph coding problems and 36,900 test cases.<n>Our dataset is categorized into eight primary and four sub-categories to ensure a thorough evaluation across different types of graphs.<n>To improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD)<n>SSD improves the average passing rate of GPT-4, GPT-4o, Gemini-Pro and Claude-3-Sonnet by 8.38%, 6.78%, 29.28% and 25.28%, respectively.
Score: 19.329274124787858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to manipulate, program, and reason about structured data, especially graphs. We introduce GraphEval36K, the first comprehensive graph dataset, comprising 40 graph coding problems and 36,900 test cases to evaluate the ability of LLMs on graph problem-solving. Our dataset is categorized into eight primary and four sub-categories to ensure a thorough evaluation across different types of graphs. We benchmark ten LLMs, finding that private models outperform open-source ones, though the gap is narrowing. We also analyze the performance of LLMs across directed vs undirected graphs, different kinds of graph concepts, and network models. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on complex graph tasks. Results show that SSD improves the average passing rate of GPT-4, GPT-4o, Gemini-Pro and Claude-3-Sonnet by 8.38%, 6.78%, 29.28% and 25.28%, respectively.

Related papers

Exploring Graph Tasks with Pure LLMs: A Comprehensive Benchmark and Investigation [26.19182768810174]
Graph-structured data has become increasingly prevalent across various domains, raising the demand for effective models to handle graph tasks. Traditional graph learning models like Graph Neural Networks (GNNs) have made significant strides, but their capabilities in handling graph data remain limited in certain contexts. In recent years, large language models (LLMs) have emerged as promising candidates for graph tasks, yet most studies focus primarily on performance benchmarks.
arXiv Detail & Related papers (2025-02-26T03:03:46Z)
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension [53.6373473053431]
This work introduces a benchmark to assess large language models' capabilities in graph pattern tasks. We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions. Our benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models.
arXiv Detail & Related papers (2024-10-04T04:48:33Z)
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [90.98855064914379]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z)
Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs) We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem. Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z)
Investigating Instruction Tuning Large Language Models on Graphs [37.20541711360419]
There's growing interest in applying Large Language Models (LLMs) to graph-related tasks. This study delves into the capabilities of instruction-following LLMs for engaging with real-world graphs.
arXiv Detail & Related papers (2024-08-10T06:54:35Z)
Parameter-Efficient Tuning Large Language Models for Graph Representation Learning [62.26278815157628]
We introduce Graph-aware. Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning. We use a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt. We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations.
arXiv Detail & Related papers (2024-04-28T18:36:59Z)
Exploring the Potential of Large Language Models in Graph Generation [51.046188600990014]
Graph generation requires large language models (LLMs) to generate graphs with given properties. This paper explores the abilities of LLMs for graph generation with systematical task designs and experiments. Our evaluations demonstrate that LLMs, particularly GPT-4, exhibit preliminary abilities in graph generation tasks.
arXiv Detail & Related papers (2024-03-21T12:37:54Z)
Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? [38.1577036285387]
Large language models (LLMs) have achieved significant success in reasoning tasks, including mathematical reasoning and logical deduction. Previous studies have explored LLMs' graph reasoning abilities through various techniques. A critical factor has been mostly overlooked: the prompt sequential order in which graph descriptions are presented to the models.
arXiv Detail & Related papers (2024-02-11T09:46:24Z)
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model [7.218768686958888]
GraphLLM is a pioneering end-to-end approach that integrates graph learning models with Large Language Models. Our empirical evaluations across four fundamental graph reasoning tasks validate the effectiveness of GraphLLM. The results exhibit a substantial average accuracy enhancement of 54.44%, alongside a noteworthy context reduction of 96.45%.
arXiv Detail & Related papers (2023-10-09T16:42:00Z)
Integrating Graphs with Large Language Models: Methods and Prospects [68.37584693537555]
Large language models (LLMs) have emerged as frontrunners, showcasing unparalleled prowess in diverse applications. Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest. This paper bifurcates such integrations into two predominant categories.
arXiv Detail & Related papers (2023-10-09T07:59:34Z)
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data [13.524529952170672]
Large language models (LLMs) have achieved impressive performance on many natural language processing tasks. We aim to assess whether LLMs can effectively process graph data and leverage topological structures to enhance performance. By comparing LLMs' performance with specialized graph models, we offer insights into the strengths and limitations of employing LLMs for graph analytics.
arXiv Detail & Related papers (2023-10-07T23:25:22Z)
Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures. We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z)
Investigating Pretrained Language Models for Graph-to-Text Generation [55.55151069694146]
Graph-to-text generation aims to generate fluent texts from graph-based data. We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs. We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.
arXiv Detail & Related papers (2020-07-16T16:05:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.