Fugu-MT 論文翻訳(概要): GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs

論文の概要: GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs

arxiv url: http://arxiv.org/abs/2606.11562v1
Date: Wed, 10 Jun 2026 01:41:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.241081
Title: GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs
Title（参考訳）: GraphInfer-Bench: グラフ上でのLLMの推論能力のベンチマーク
Authors: Zhuoyi Peng, Jingzhou Jiang, Hanlin Gu, Lixin Fan, Yi Yang,
Abstract要約: GraphInfer-Benchは、グラフ推論のためのベンチマークであり、単一のノードがサポートせず、パスが取得されない、オープンな回答を生成する。既存のGraph-QAプロトコルでは、アルゴリズムシミュレーション、ノード分類、単一ノード記述、KG-QA、GraphRAGはすべて、あるノードまたはパスに沿って検索可能な回答を認めている。リリースには6つの実世界のグラフにまたがる42,000のサンプルが含まれている。
参考スコア（独自算出の注目度）: 21.80259191908136
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph analysis underlies many applications whose answers cannot be looked up in a single record or retrieved along a path: laundering rings, drug repurposing, user preference, and scientific theme are all inferred from a node together with its neighbourhood. We introduce GraphInfer-Bench, a benchmark for whether LLMs can perform this graph inference: producing an open-ended answer that no single node supports and no path retrieves. Existing graph-QA protocols cannot test this capability: algorithm simulation, node classification, single-node description, KG-QA, and GraphRAG all admit answers retrievable from one node or along a path. GraphInfer-Bench defines five tasks along Description (what a region is) and Comparison (how regions differ), each constructed so the ground truth lives in no single node. The release contains 42,000 samples across six real-world graphs, produced automatically and screened by a four-layer quality-control protocol. We evaluate four method families against the same tasks: graph-token alignment models, zero-shot frontier closed-source LLMs, Graph2Text supervised fine-tuning, and plain GNNs as a structural reference. No method family closes the gap. Graph-token alignment partially handles description tasks (relational, theme) but collapses on comparison tasks. Frontier LLMs lead on outlier detection and community partition among LLM-based methods but lag on masked-node prediction. Graph2Text SFT is the strongest LLM-based method on the description side yet falls behind frontier LLMs on comparison. Across every task, plain GNNs match or beat the strongest LLM-based row, with the largest margin on community detection. GraphInfer-Bench surfaces graph inference as an open capability gap rather than a property of any one architecture.
Abstract（参考訳）: グラフ解析は、答えを単一のレコードで調べられない、あるいは経路に沿って検索できない多くのアプリケーションの基礎となる: 洗浄リング、薬物の再利用、ユーザの好み、科学的テーマはすべて、その近隣のノードから推論される。 GraphInfer-Benchは、LLMがこのグラフ推論を実行できるかどうかのベンチマークである。既存のGraph-QAプロトコルでは、アルゴリズムシミュレーション、ノード分類、単一ノード記述、KG-QA、GraphRAGはすべて、あるノードまたはパスに沿って検索可能な回答を認めている。 GraphInfer-Bench は Description (リージョンとは何か) と Comparison (リージョンがどのように違うのか) に沿った5つのタスクを定義し、それぞれが構築され、基底真理は単一のノードに存在しない。リリースには6つの実世界のグラフにまたがる42,000のサンプルが含まれている。我々は、同じタスクに対して、4つのメソッドファミリを評価する。グラフトークンアライメントモデル、ゼロショットフロンティアクローズソースLLM、Graph2Textによる微調整、構造参照としてプレーンGNNである。ギャップを閉じるメソッドファミリはありません。グラフトークンアライメントは、部分的に記述タスク(リレーショナル、テーマ)を処理するが、比較タスクでは崩壊する。最前線のLSMは、LPMベースの手法では外れ値の検出とコミュニティ分割に導かれるが、マスクノード予測には遅延がある。 Graph2Text SFT は記述側で最強の LLM ベースの手法であるが、比較上はフロンティア LLM より遅れている。あらゆるタスクにおいて、通常のGNNは最強のLDMベースの行にマッチするか、打ち負かされる。 GraphInfer-Benchはグラフ推論を、どのアーキテクチャの性質よりもオープンな能力ギャップとして表す。

論文の概要: GraphInfer-Bench: Benchmarking LLM's Inference Capability on Graphs

関連論文リスト