GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2407.02936v1
- Date: Wed, 3 Jul 2024 09:12:38 GMT
- Title: GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models
- Authors: Zike Yuan, Ming Liu, Hui Wang, Bing Qin,
- Abstract summary: This paper presents GraCoRe, a benchmark for systematically assessing Large Language Models' graph comprehension and reasoning.
GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs.
Key findings reveal that semantic enrichment enhances reasoning performance, node ordering impacts task success, and the ability to process longer texts does not necessarily improve graph comprehension or reasoning.
- Score: 22.705728671135834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs' graph comprehension and reasoning. GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs, subdividing capabilities into 10 distinct areas tested through 19 tasks. Our benchmark includes 11 datasets with 5,140 graphs of varying complexity. We evaluated three closed-source and seven open-source LLMs, conducting thorough analyses from both ability and task perspectives. Key findings reveal that semantic enrichment enhances reasoning performance, node ordering impacts task success, and the ability to process longer texts does not necessarily improve graph comprehension or reasoning. GraCoRe is open-sourced at https://github.com/ZIKEYUAN/GraCoRe
Related papers
- GC-Bench: An Open and Unified Benchmark for Graph Condensation [54.70801435138878]
We develop a comprehensive Graph Condensation Benchmark (GC-Bench) to analyze the performance of graph condensation.
GC-Bench systematically investigates the characteristics of graph condensation in terms of the following dimensions: effectiveness, transferability, and complexity.
We have developed an easy-to-use library for training and evaluating different GC methods to facilitate reproducible research.
arXiv Detail & Related papers (2024-06-30T07:47:34Z) - Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks.
Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora.
We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - Talk like a Graph: Encoding Graphs for Large Language Models [15.652881653332194]
We study the first comprehensive study of encoding graph-structured data as text for consumption by large language models (LLMs)
We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered.
arXiv Detail & Related papers (2023-10-06T19:55:21Z) - KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks [1.8722948221596285]
KGrEaT is a framework to estimate the quality of knowledge graphs via actual downstream tasks like classification, clustering, or recommendation.
The framework takes a knowledge graph as input, automatically maps it to the datasets to be evaluated on, and computes performance metrics for the defined tasks.
arXiv Detail & Related papers (2023-08-21T07:43:10Z) - GPT4Graph: Can Large Language Models Understand Graph Structured Data ?
An Empirical Evaluation and Benchmarking [17.7473474499538]
Large language models like ChatGPT have become indispensable to artificial general intelligence.
In this study, we conduct an investigation to assess the proficiency of LLMs in comprehending graph data.
Our findings contribute valuable insights towards bridging the gap between language models and graph understanding.
arXiv Detail & Related papers (2023-05-24T11:53:19Z) - Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures.
We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z) - Node Feature Extraction by Self-Supervised Multi-scale Neighborhood
Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT)
GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information.
We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z) - Grale: Designing Networks for Graph Learning [68.23038997141381]
We present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes.
Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes.
We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score.
arXiv Detail & Related papers (2020-07-23T13:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.