Rethinking and Benchmarking Large Language Models for Graph Reasoning
- URL: http://arxiv.org/abs/2509.24260v2
- Date: Thu, 02 Oct 2025 01:19:14 GMT
- Title: Rethinking and Benchmarking Large Language Models for Graph Reasoning
- Authors: Yuwei Hu, Xinyi Huang, Zhewei Wei, Yongchao Liu, Chuntao Hong,
- Abstract summary: Large Language Models (LLMs) for Graph Reasoning have been extensively studied over the past two years.<n>Recent studies underscore the potential of LLMs in handling graph reasoning tasks, but their performance is underwhelming.
- Score: 36.30471027175558
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) for Graph Reasoning have been extensively studied over the past two years, involving enabling LLMs to understand graph structures and reason on graphs to solve various graph problems, with graph algorithm problems being the most prevalent. Recent studies underscore the potential of LLMs in handling graph reasoning tasks, but their performance is underwhelming. In this work, we point out issues with existing methods and benchmarks, and rethink the direction that LLMs for graph reasoning should strive toward. We find that base models, e.g., GPT-4o-mini, are largely underestimated due to improper reasoning focus. Base models with reasoning focus redirected from replicating graph algorithms to designing them can easily solve most graph reasoning tasks in existing benchmarks. To truly evaluate the graph reasoning capabilities of LLMs, we construct a more challenging GraphAlgorithm benchmark, comprising 239 different graph problems and 3,041 test instances collected from 4 competition platforms. Finally, we introduce a simple and strong baseline Simple-Reasoning-Then-Coding (Simple-RTC)-which guides LLMs to design graph algorithms first and then code to address graph reasoning tasks. Simple-RTC achieves near-perfect accuracy on existing benchmarks and significantly outperforms GPT-4o-mini and all prior methods on the GraphAlgorithm benchmark. This strong baseline encourages further advancements in LLMs for Graph Reasoning in the future.
Related papers
- From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs [52.0811904328273]
Real-world reasoning requires integrating multiple premises and solving subproblems in parallel.<n>Recent approaches rely on externally provided graphs and do not explore how LLMs can construct and use their own graph-structured reasoning.<n>We propose Self-Graph Reasoning (SGR), a framework that enables LLMs to explicitly represent their reasoning process as a structured graph before producing the final answer.
arXiv Detail & Related papers (2026-01-07T05:27:41Z) - Graph-R1: Incentivizing the Zero-Shot Graph Learning Capability in LLMs via Explicit Reasoning [7.1931434571877375]
Graph Neural Networks (GNNs) are limited by fixed label spaces, while Large Language Models (LLMs) lack structural inductive biases.<n>Recent advances in Large Reasoning Models (LRMs) provide a zero-shot alternative via explicit, long chain-of-thought reasoning.<n>We propose a GNN-free approach that reformulates graph tasks--node classification, link prediction, and graph classification--as textual reasoning problems solved by LRMs.
arXiv Detail & Related papers (2025-08-24T14:49:02Z) - What Do LLMs Need to Understand Graphs: A Survey of Parametric Representation of Graphs [69.48708136448694]
Large language models (LLMs) are reorganizing in the AI community for their expected reasoning and inference abilities.<n>We believe this kind of parametric representation of graphs, graph laws, can be a solution for making LLMs understand graph data as the input.
arXiv Detail & Related papers (2024-10-16T00:01:31Z) - Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [88.4320775961431]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs.<n>Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy.<n>We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z) - Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs)<n>We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem.<n>Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z) - GraphArena: Evaluating and Exploring Large Language Models on Graph Computation [38.65000765032749]
GraphArena is a tool designed to evaluate Large Language Models (LLMs) on real-world graph problems.<n> Evaluation of over 10 LLMs reveals that even top-performing LLMs struggle with larger, more complex graph problems.<n>We explore four potential solutions to address this issue, including chain-of-thought prompting, instruction tuning, code writing, and scaling test-time compute.
arXiv Detail & Related papers (2024-06-29T09:19:23Z) - Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? [38.1577036285387]
Large language models (LLMs) have achieved significant success in reasoning tasks, including mathematical reasoning and logical deduction.<n>Previous studies have explored LLMs' graph reasoning abilities through various techniques.<n>A critical factor has been mostly overlooked: the prompt sequential order in which graph descriptions are presented to the models.
arXiv Detail & Related papers (2024-02-11T09:46:24Z) - GraphLLM: Boosting Graph Reasoning Ability of Large Language Model [7.218768686958888]
GraphLLM is a pioneering end-to-end approach that integrates graph learning models with Large Language Models.
Our empirical evaluations across four fundamental graph reasoning tasks validate the effectiveness of GraphLLM.
The results exhibit a substantial average accuracy enhancement of 54.44%, alongside a noteworthy context reduction of 96.45%.
arXiv Detail & Related papers (2023-10-09T16:42:00Z) - Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures.
We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.