Graph-S3: Enhancing Agentic textual Graph Retrieval with Synthetic Stepwise Supervision
- URL: http://arxiv.org/abs/2510.03323v1
- Date: Wed, 01 Oct 2025 13:27:36 GMT
- Title: Graph-S3: Enhancing Agentic textual Graph Retrieval with Synthetic Stepwise Supervision
- Authors: Ge Chang, Jinbo Su, Jiacheng Liu, Pengfei Yang, Yuhao Shang, Huiwen Zheng, Hongli Ma, Yan Liang, Yuanchun Li, Yunxin Liu,
- Abstract summary: Graph-$S3$ is an agentic textual graph reasoning framework that employs an LLM-based retriever trained with synthetic stepwise supervision.<n>Our approach achieves an average improvement of 8.1% in accuracy and 9.7% in F$_1$ score.
- Score: 11.806233991934725
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A significant portion of real-world data is inherently represented as textual graphs, and integrating these graphs into large language models (LLMs) is promising to enable complex graph-based question answering. However, a key challenge in LLM-based textual graph QA systems lies in graph retrieval, i.e., how to retrieve relevant content from large graphs that is sufficiently informative while remaining compact for the LLM context. Existing retrievers suffer from poor performance since they either rely on shallow embedding similarity or employ interactive retrieving policies that demand excessive data labeling and training cost. To address these issues, we present Graph-$S^3$, an agentic textual graph reasoning framework that employs an LLM-based retriever trained with synthetic stepwise supervision. Instead of rewarding the agent based on the final answers, which may lead to sparse and unstable training signals, we propose to closely evaluate each step of the retriever based on offline-extracted golden subgraphs. Our main techniques include a data synthesis pipeline to extract the golden subgraphs for reward generation and a two-stage training scheme to learn the interactive graph exploration policy based on the synthesized rewards. Based on extensive experiments on three common datasets in comparison with seven strong baselines, our approach achieves an average improvement of 8.1\% in accuracy and 9.7\% in F$_1$ score. The advantage is even higher in more complicated multi-hop reasoning tasks. Our code will be open-sourced.
Related papers
- GRIL: Knowledge Graph Retrieval-Integrated Learning with Large Language Models [59.72897499248909]
We propose a novel graph retriever trained end-to-end with Large Language Models (LLMs)<n>Within the extracted subgraph, structural knowledge and semantic features are encoded via soft tokens and the verbalized graph, respectively, which are infused into the LLM together.<n>Our approach consistently achieves state-of-the-art performance, validating the strength of joint graph-LLM optimization for complex reasoning tasks.
arXiv Detail & Related papers (2025-09-20T02:38:00Z) - GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning [13.481673780508215]
GRAIL is a framework designed to interact with large-scale graphs for retrieval-augmented reasoning.<n>GRAIL achieves an average accuracy improvement of 21.01% and F1 improvement of 22.43% on knowledge graph question-answering datasets.
arXiv Detail & Related papers (2025-08-07T15:34:41Z) - DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs [55.13817504780764]
Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features, often rich in textual data, and graph structural information.<n>Graph-Enhanced LLMs emerge as a promising graph learning approach that converts graph information into prompts.<n>We propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node.
arXiv Detail & Related papers (2025-07-29T10:10:47Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - Generalizable LLM Learning of Graph Synthetic Data with Post-training Alignment [38.485929062532925]
We propose to unlock generalizable learning of graph with post-training alignment with synthetic data.<n>We employ post-training alignment algorithms such as GRPO and DPO, aligning both off-the-shelf LLMs and LLMs fine-tuned on synthetic graph data.<n>Extensive experiments demonstrate that our post-training alignment recipe leads to statistically significant improvement on 5 datasets.
arXiv Detail & Related papers (2025-06-01T05:39:56Z) - Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [79.75818239774952]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z) - Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks.
Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora.
We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - Talk like a Graph: Encoding Graphs for Large Language Models [15.652881653332194]
We study the first comprehensive study of encoding graph-structured data as text for consumption by large language models (LLMs)
We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered.
arXiv Detail & Related papers (2023-10-06T19:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.