Related papers: GraphArena: Benchmarking Large Language Models on Graph Computational Problems

GraphArena: Benchmarking Large Language Models on Graph Computational Problems

URL: http://arxiv.org/abs/2407.00379v1
Date: Sat, 29 Jun 2024 09:19:23 GMT
Title: GraphArena: Benchmarking Large Language Models on Graph Computational Problems
Authors: Jianheng Tang, Qifan Zhang, Yuhan Li, Jia Li,
Abstract summary: "arms race" of Large Language Models (LLMs) demands novel, challenging, and diverse benchmarks to examine their progresses. We introduce GraphArena, a benchmarking tool to evaluate models on graph computational problems using million-scale real-world graphs.
Score: 25.72820021030033
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The "arms race" of Large Language Models (LLMs) demands novel, challenging, and diverse benchmarks to faithfully examine their progresses. We introduce GraphArena, a benchmarking tool designed to evaluate LLMs on graph computational problems using million-scale real-world graphs from diverse scenarios such as knowledge graphs, social networks, and molecular structures. GraphArena offers a suite of 10 computational tasks, encompassing four polynomial-time (e.g., Shortest Distance) and six NP-complete challenges (e.g., Travelling Salesman Problem). It features a rigorous evaluation framework that classifies LLM outputs as correct, suboptimal (feasible but not optimal), or hallucinatory (properly formatted but infeasible). Evaluation of 10 leading LLMs, including GPT-4o and LLaMA3-70B-Instruct, reveals that even top-performing models struggle with larger, more complex graph problems and exhibit hallucination issues. Despite the application of strategies such as chain-of-thought prompting, these issues remain unresolved. GraphArena contributes a valuable supplement to the existing LLM benchmarks and is open-sourced at https://github.com/squareRoot3/GraphArena.

Related papers

Rethinking and Benchmarking Large Language Models for Graph Reasoning [36.30471027175558]
Large Language Models (LLMs) for Graph Reasoning have been extensively studied over the past two years.<n>Recent studies underscore the potential of LLMs in handling graph reasoning tasks, but their performance is underwhelming.
arXiv Detail & Related papers (2025-09-29T04:10:12Z)
Graph-Grounded LLMs: Leveraging Graphical Function Calling to Minimize LLM Hallucinations [8.07547612687425]
Graphs are integral to a wide range of applications, including motion planning for autonomous vehicles, social networks, scene understanding, and knowledge graphs. We propose Graph-Grounded LLMs, a system that improves LLM performance on graph-related tasks by integrating a graph library through function calls. We demonstrate significant reductions in hallucinations and improved mathematical accuracy in solving graph-based problems, as evidenced by the performance on the NLGraph benchmark.
arXiv Detail & Related papers (2025-03-13T22:57:28Z)
GCoder: Improving Large Language Model for Generalized Graph Problem Solving [38.9131866084555]
Large Language Models (LLMs) have demonstrated strong reasoning abilities, making them suitable for complex tasks such as graph computation. We introduce GCoder, a code-based LLM designed to enhance problem-solving in generalized graph problems. Our method involves constructing an extensive training dataset, GraphWild, featuring diverse graph formats and algorithms.
arXiv Detail & Related papers (2024-10-24T18:40:36Z)
What Do LLMs Need to Understand Graphs: A Survey of Parametric Representation of Graphs [69.48708136448694]
Large language models (LLMs) are reorganizing in the AI community for their expected reasoning and inference abilities. We believe this kind of parametric representation of graphs, graph laws, can be a solution for making LLMs understand graph data as the input.
arXiv Detail & Related papers (2024-10-16T00:01:31Z)
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [90.98855064914379]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs. Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy. We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z)
Graph Reasoning with Large Language Models via Pseudo-code Prompting [25.469214467011362]
This paper investigates whether prompting via pseudo-code instructions can improve the performance of large language models (LLMs) in solving graph problems. Our experiments demonstrate that using pseudo-code instructions generally improves the performance of all considered LLMs.
arXiv Detail & Related papers (2024-09-26T14:52:40Z)
Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs) We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem. Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z)
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora. We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z)
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? [99.0305256706604]
We introduce MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs. We meticulously collect 2,612 high-quality, multi-subject math problems with diagrams from publicly available sources. This approach allows MathVerse to comprehensively assess whether and how much MLLMs can truly understand the visual diagrams for mathematical reasoning.
arXiv Detail & Related papers (2024-03-21T17:59:50Z)
LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data. LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks. Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z)
Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? [55.5662721046769]
Large language models (LLMs) have achieved significant success in reasoning tasks, including mathematical reasoning and logical deduction.<n>Previous studies have explored LLMs' graph reasoning abilities through various techniques.<n>A critical factor has been mostly overlooked: the prompt sequential order in which graph descriptions are presented to the models.
arXiv Detail & Related papers (2024-02-11T09:46:24Z)
GraphLLM: Boosting Graph Reasoning Ability of Large Language Model [7.218768686958888]
GraphLLM is a pioneering end-to-end approach that integrates graph learning models with Large Language Models. Our empirical evaluations across four fundamental graph reasoning tasks validate the effectiveness of GraphLLM. The results exhibit a substantial average accuracy enhancement of 54.44%, alongside a noteworthy context reduction of 96.45%.
arXiv Detail & Related papers (2023-10-09T16:42:00Z)
Integrating Graphs with Large Language Models: Methods and Prospects [68.37584693537555]
Large language models (LLMs) have emerged as frontrunners, showcasing unparalleled prowess in diverse applications. Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest. This paper bifurcates such integrations into two predominant categories.
arXiv Detail & Related papers (2023-10-09T07:59:34Z)
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis [7.099257763803159]
We evaluate the capabilities of four Large Language Models (LLMs) in addressing several analytical problems with graph data. We employ four distinct evaluation metrics: Correctness, Fidelity, and Rectification. GPT models can generate logical and coherent results, outperforming alternatives in correctness.
arXiv Detail & Related papers (2023-08-22T06:32:07Z)
Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures. We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.