FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs
- URL: http://arxiv.org/abs/2508.10467v1
- Date: Thu, 14 Aug 2025 09:08:50 GMT
- Title: FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs
- Authors: Xueli Pan, Victor de Boer, Jacco van Ossenbruggen,
- Abstract summary: We propose a modular framework that supports fine-tuned LLMs as a core component, with optional context provided via RAG and a SPARQL query correction layer.<n>We measure query accuracy using BLEU and ROUGE metrics, and query result accuracy using relaxed exact match(RelaxedEM)<n> Experimental results demonstrate that fine-tuning achieves the highest overall performance, reaching 0.90 ROUGE-L for query accuracy and 0.85 RelaxedEM for result accuracy on the test set.
- Score: 0.5120567378386615
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Question answering over Scholarly Knowledge Graphs (SKGs) remains a challenging task due to the complexity of scholarly content and the intricate structure of these graphs. Large Language Model (LLM) approaches could be used to translate natural language questions (NLQs) into SPARQL queries; however, these LLM-based approaches struggle with SPARQL query generation due to limited exposure to SKG-specific content and the underlying schema. We identified two main types of errors in the LLM-generated SPARQL queries: (i) structural inconsistencies, such as missing or redundant triples in the queries, and (ii) semantic inaccuracies, where incorrect entities or properties are shown in the queries despite a correct query structure. To address these issues, we propose FIRESPARQL, a modular framework that supports fine-tuned LLMs as a core component, with optional context provided via retrieval-augmented generation (RAG) and a SPARQL query correction layer. We evaluate the framework on the SciQA Benchmark using various configurations (zero-shot, zero-shot with RAG, one-shot, fine-tuning, and fine-tuning with RAG) and compare the performance with baseline and state-of-the-art approaches. We measure query accuracy using BLEU and ROUGE metrics, and query result accuracy using relaxed exact match(RelaxedEM), with respect to the gold standards containing the NLQs, SPARQL queries, and the results of the queries. Experimental results demonstrate that fine-tuning achieves the highest overall performance, reaching 0.90 ROUGE-L for query accuracy and 0.85 RelaxedEM for result accuracy on the test set.
Related papers
- SPARQL-LLM: Real-Time SPARQL Query Generation from Natural Language Questions [1.3856736555085554]
SPARQL-LLM is an open-source and triplestore-agnostic approach, powered by lightweight metadata, that generates SPARQL queries from natural language text.<n>We show that SPARQL-LLM is up to 36x faster than other systems participating in the challenge, while costing a maximum of $0.01 per question.
arXiv Detail & Related papers (2025-12-16T10:39:46Z) - InteracSPARQL: An Interactive System for SPARQL Query Refinement Using Natural Language Explanations [8.464973032764446]
InteracSPARQL is an interactive SPARQL query generation and refinement system.<n>Users can interactively refine queries through direct feedback or LLM-driven self-refinement.<n>We evaluate InteracSPARQL on standard benchmarks, demonstrating significant improvements in query accuracy, explanation clarity, and overall user satisfaction compared to baseline approaches.
arXiv Detail & Related papers (2025-11-03T19:15:51Z) - SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection [81.78173888579941]
Large Language Models (LLMs) are considered a well-suited method to increase the quality of the question-answering functionality.<n>LLMs are trained on web data, where researchers have no control over whether the benchmark or the knowledge graph was already included in the training data.<n>This paper introduces a novel method that evaluates the quality of LLMs by generating a SPARQL query from a natural-language question.
arXiv Detail & Related papers (2025-07-18T12:28:08Z) - The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z) - Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling [69.84963245729826]
Large language models (LLMs) have shown compelling semantic understanding capabilities.<n>Dense retrieval is a crucial task in Information Retrieval (IR) and is the foundation for downstream tasks as re-ranking.<n>We introduce an auxiliary task of QL estimation to yield a better backbone for contrast learning a discriminative retriever.
arXiv Detail & Related papers (2025-04-07T16:03:59Z) - Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval [9.860751439256754]
Large language models (LLMs) are susceptible to hallucinations and out-of-distribution errors when producing KG elements.<n>This has led to increased research aimed at detecting and mitigating such errors.<n>In this paper, we introduce PGMR, a modular framework that incorporates a non-parametric memory module to retrieve KG elements.
arXiv Detail & Related papers (2025-02-19T02:08:13Z) - Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - Assessing SPARQL capabilities of Large Language Models [0.0]
We focus on measuring out-of-the box capabilities of Large Language Models to work with SPARQL.<n>We implement benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation.<n>Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs.
arXiv Detail & Related papers (2024-09-09T08:29:39Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - AutoQGS: Auto-Prompt for Low-Resource Knowledge-based Question
Generation from SPARQL [18.019353543946913]
This study investigates the task of knowledge-based question generation (KBQG)
Conventional KBQG works generated questions from fact triples in the knowledge graph, which could not express complex operations like aggregation and comparison in SPARQL.
We propose an auto-prompter trained on large-scale unsupervised data to rephrase SPARQL into NL description.
arXiv Detail & Related papers (2022-08-26T06:53:46Z) - Knowledge Base Question Answering by Case-based Reasoning over Subgraphs [81.22050011503933]
We show that our model answers queries requiring complex reasoning patterns more effectively than existing KG completion algorithms.
The proposed model outperforms or performs competitively with state-of-the-art models on several KBQA benchmarks.
arXiv Detail & Related papers (2022-02-22T01:34:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.