Related papers: AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?

AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?

URL: http://arxiv.org/abs/2508.09631v1
Date: Wed, 13 Aug 2025 09:06:59 GMT
Title: AmbiGraph-Eval: Can LLMs Effectively Handle Ambiguous Graph Queries?
Authors: Yuchen Tian, Kaixin Li, Hao Chen, Ziyang Luo, Hongzhan Lin, Sebastian Schelter, Lun Du, Jing Ma,
Abstract summary: AmbiGraph-Eval is a novel benchmark of real-world ambiguous queries paired with expert-verified graph query answers.<n>Our findings reveal a critical gap in ambiguity handling and motivate future work on specialized resolution techniques.
Score: 31.91169297907121
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have recently demonstrated strong capabilities in translating natural language into database queries, especially when dealing with complex graph-structured data. However, real-world queries often contain inherent ambiguities, and the interconnected nature of graph structures can amplify these challenges, leading to unintended or incorrect query results. To systematically evaluate LLMs on this front, we propose a taxonomy of graph-query ambiguities, comprising three primary types: Attribute Ambiguity, Relationship Ambiguity, and Attribute-Relationship Ambiguity, each subdivided into Same-Entity and Cross-Entity scenarios. We introduce AmbiGraph-Eval, a novel benchmark of real-world ambiguous queries paired with expert-verified graph query answers. Evaluating 9 representative LLMs shows that even top models struggle with ambiguous graph queries. Our findings reveal a critical gap in ambiguity handling and motivate future work on specialized resolution techniques.

Related papers

GraphSeek: Next-Generation Graph Analytics with LLMs [15.668202711555749]
LLMs promise accessible natural language (NL) graph analytics, but they fail to process industry-scale property graphs effectively and efficiently.<n>We devise a novel abstraction for complex multi-query analytics over such graphs.<n>We use this abstraction as the basis of the first LLM-enhanced graph analytics framework called GraphSeek.
arXiv Detail & Related papers (2026-02-11T17:20:06Z)
Colorful Talks with Graphs: Human-Interpretable Graph Encodings for Large Language Models [12.496005049442319]
Graph problems are fundamentally challenging for large language models (LLMs)<n>We introduce a human-interpretable structural encoding strategy for graph-to-text translation.<n>Our method enhances LLM performance especially on graph tasks that require reasoning over global graph structure.
arXiv Detail & Related papers (2026-02-11T00:15:29Z)
Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching [61.824094419641575]
Large Language Models (LLMs) struggle with hallucinations and factual errors in knowledge-intensive scenarios like knowledge graph question answering (KGQA)<n>We attribute this to the semantic gap between structured knowledge graphs (KGs) and unstructured queries, caused by inherent differences in their focuses and structures.<n>Existing methods usually employ resource-intensive, non-scalable reasoning on vanilla KGs, but overlook this gap.<n>We propose a flexible framework, Enrich-on-Graph (EoG), which leverages LLMs' prior knowledge to enrich KGs, bridge the semantic gap between graphs and queries.
arXiv Detail & Related papers (2025-09-25T06:48:52Z)
DAGR: Decomposition Augmented Graph Retrieval with LLMs [1.034893617526558]
DAGR is a retrieval method that leverages both complex questions and their decomposition in subquestions to extract relevant, linked subgraphs.<n>The resulting Graph-RAG pipeline is suited to handle complex multi-hop questions and effectively reason over graph-structured data.<n>We evaluate DAGR on standard multi-hop QA benchmarks and show that it achieves comparable or superior performance to competitive existing methods.
arXiv Detail & Related papers (2025-06-16T11:44:28Z)
Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [75.9865035064794]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z)
ZOGRASCOPE: A New Benchmark for Semantic Parsing over Property Graphs [3.0748861313823]
Property graphs (PGs) have seen increased adoption as a means of representing complex structured information.<n>Despite their growing popularity in industry, PGs remain relatively underrepresented in semantic parsing research.<n>We introduce ZOGRASCOPE, a benchmark designed specifically for PGs and queries written in Cypher.
arXiv Detail & Related papers (2025-03-07T09:33:30Z)
Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts [62.45232157149698]
We introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content.<n> Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of MLLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost.
arXiv Detail & Related papers (2025-03-06T05:08:40Z)
LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models [54.82915844507371]
Text-Attributed Graphs (TAGs) are ubiquitous in real-world scenarios.<n>Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures.<n>We propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning.
arXiv Detail & Related papers (2025-03-05T09:45:22Z)
Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System [5.37125692728042]
We propose an iterative-Guided Scene-Graph reasoning framework based on multi-agent Large Language Models (LLMs)<n>Two modules collaborate iteratively, enabling sequential reasoning and adaptive attention to graph information.<n>Our framework surpasses existing LLM-based approaches and baseline single-agent, tool-based Reason-while-Retrieve strategy in numerical Q&A and planning tasks.
arXiv Detail & Related papers (2025-02-05T18:50:38Z)
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension [53.6373473053431]
This work introduces a benchmark to assess large language models' capabilities in graph pattern tasks.<n>We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions.<n>Our benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models.
arXiv Detail & Related papers (2024-10-04T04:48:33Z)
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [70.03602551880526]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs.<n>Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy.<n>We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z)
Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models [33.662269036173456]
Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. Knowledge Graph Question Answering (KGQA) serves as a critical touchstone for the integration. We propose an interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG)
arXiv Detail & Related papers (2024-09-05T01:11:58Z)
SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph [0.0]
We evaluate strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. We propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph. We also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments.
arXiv Detail & Related papers (2024-02-07T07:24:01Z)
Integrating Graphs with Large Language Models: Methods and Prospects [68.37584693537555]
Large language models (LLMs) have emerged as frontrunners, showcasing unparalleled prowess in diverse applications. Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest. This paper bifurcates such integrations into two predominant categories.
arXiv Detail & Related papers (2023-10-09T07:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.