Fugu-MT 論文翻訳(概要): Rethinking and Benchmarking Large Language Models for Graph Reasoning

論文の概要: Rethinking and Benchmarking Large Language Models for Graph Reasoning

arxiv url: http://arxiv.org/abs/2509.24260v2
Date: Thu, 02 Oct 2025 01:19:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 12:04:55.854159
Title: Rethinking and Benchmarking Large Language Models for Graph Reasoning
Title（参考訳）: グラフ推論のための大規模言語モデルの再検討とベンチマーク
Authors: Yuwei Hu, Xinyi Huang, Zhewei Wei, Yongchao Liu, Chuntao Hong,
Abstract要約: グラフ推論のための大規模言語モデル(LLM)は、過去2年間にわたって広く研究されてきた。近年の研究では、LLMがグラフ推論タスクを扱う可能性を示しているが、その性能は過大評価されている。
参考スコア（独自算出の注目度）: 36.30471027175558
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) for Graph Reasoning have been extensively studied over the past two years, involving enabling LLMs to understand graph structures and reason on graphs to solve various graph problems, with graph algorithm problems being the most prevalent. Recent studies underscore the potential of LLMs in handling graph reasoning tasks, but their performance is underwhelming. In this work, we point out issues with existing methods and benchmarks, and rethink the direction that LLMs for graph reasoning should strive toward. We find that base models, e.g., GPT-4o-mini, are largely underestimated due to improper reasoning focus. Base models with reasoning focus redirected from replicating graph algorithms to designing them can easily solve most graph reasoning tasks in existing benchmarks. To truly evaluate the graph reasoning capabilities of LLMs, we construct a more challenging GraphAlgorithm benchmark, comprising 239 different graph problems and 3,041 test instances collected from 4 competition platforms. Finally, we introduce a simple and strong baseline Simple-Reasoning-Then-Coding (Simple-RTC)-which guides LLMs to design graph algorithms first and then code to address graph reasoning tasks. Simple-RTC achieves near-perfect accuracy on existing benchmarks and significantly outperforms GPT-4o-mini and all prior methods on the GraphAlgorithm benchmark. This strong baseline encourages further advancements in LLMs for Graph Reasoning in the future.
Abstract（参考訳）: グラフ推論のための大規模言語モデル (LLM) は過去2年間にわたって広く研究され、LLMがグラフ構造を理解し、グラフ上で様々なグラフ問題を解けるようにし、グラフアルゴリズムの問題を最も多く抱えている。近年の研究では、LLMがグラフ推論タスクを扱う可能性を示しているが、その性能は過大評価されている。本稿では,既存の手法やベンチマークの問題点を指摘し,グラフ推論のためのLLMが取り組むべき方向性を再考する。ベースモデルであるGPT-4o-miniは,不適切な推論の焦点のため,大半が過小評価されている。推論の焦点をグラフアルゴリズムの複製から設計へとリダイレクトしたベースモデルは、既存のベンチマークにおけるほとんどのグラフ推論タスクを簡単に解決できる。 LLMのグラフ推論能力を真に評価するために、239の異なるグラフ問題と4つの競合プラットフォームから収集された3,041のテストインスタンスからなる、より困難なGraphAlgorithmベンチマークを構築した。最後に,シンプルなベースラインであるSimple-Reasoning-Then-Coding (Simple-RTC)を導入する。 Simple-RTC は既存のベンチマークでほぼ完璧な精度を実現し、GraphAlgorithm ベンチマークで GPT-4o-mini およびすべての先行手法を著しく上回っている。この強力なベースラインは、将来のグラフ推論のためのLLMのさらなる進歩を促進する。

論文の概要: Rethinking and Benchmarking Large Language Models for Graph Reasoning

関連論文リスト