Assessing SPARQL capabilities of Large Language Models
- URL: http://arxiv.org/abs/2409.05925v1
- Date: Mon, 9 Sep 2024 08:29:39 GMT
- Title: Assessing SPARQL capabilities of Large Language Models
- Authors: Lars-Peter Meyer, Johannes Frey, Felix Brei, Natanael Arndt,
- Abstract summary: We focus on measuring out-of-the box capabilities of Large Language Models to work with SPARQL.
We implement benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation.
Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of Large Language Models (LLMs) with Knowledge Graphs (KGs) offers significant synergistic potential for knowledge-driven applications. One possible integration is the interpretation and generation of formal languages, such as those used in the Semantic Web, with SPARQL being a core technology for accessing KGs. In this paper, we focus on measuring out-of-the box capabilities of LLMs to work with SPARQL and more specifically with SPARQL SELECT queries applying a quantitative approach. We implemented various benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation with several LLMs. The tasks assess capabilities along the dimensions of syntax, semantic read, semantic create, and the role of knowledge graph prompt inclusion. With this new benchmarking tasks, we evaluated a selection of GPT, Gemini, and Claude models. Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs and heavily depends on the specific LLM as well as the complexity of the task. While fixing basic syntax errors seems to pose no problems for the best of the current LLMs evaluated, creating semantically correct SPARQL SELECT queries is difficult in several cases.
Related papers
- Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning [51.203811759364925]
mKGQAgent breaks down the task of converting natural language questions into SPARQL queries into modular, interpretable subtasks.<n> Evaluated on the DBpedia- and Corporate-based KGQA benchmarks within the Text2SPARQL challenge 2025, our approach took first place among the other participants.
arXiv Detail & Related papers (2025-07-22T19:23:03Z) - SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection [81.78173888579941]
Large Language Models (LLMs) are considered a well-suited method to increase the quality of the question-answering functionality.<n>LLMs are trained on web data, where researchers have no control over whether the benchmark or the knowledge graph was already included in the training data.<n>This paper introduces a novel method that evaluates the quality of LLMs by generating a SPARQL query from a natural-language question.
arXiv Detail & Related papers (2025-07-18T12:28:08Z) - The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z) - Decompositional Reasoning for Graph Retrieval with Large Language Models [1.034893617526558]
Large Language Models (LLMs) excel at many NLP tasks, but struggle with multi-hop reasoning and factual consistency.<n>We propose a novel retrieval approach that integrates textual knowledge graphs into the LLM reasoning process via query decomposition.<n>Our method decomposes complex questions into sub-questions, retrieves relevant textual subgraphs, and composes a question-specific knowledge graph to guide answer generation.
arXiv Detail & Related papers (2025-06-16T11:44:28Z) - Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z) - Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling [69.84963245729826]
Large language models (LLMs) have shown compelling semantic understanding capabilities.
Dense retrieval is a crucial task in Information Retrieval (IR) and is the foundation for downstream tasks as re-ranking.
We introduce an auxiliary task of QL estimation to yield a better backbone for contrast learning a discriminative retriever.
arXiv Detail & Related papers (2025-04-07T16:03:59Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models [46.07900122810749]
Large language models (LLMs) have achieved unprecedented performances in various applications, yet evaluating them is still challenging.
We contend that utilizing existing relational databases is a promising approach for constructing benchmarks.
We propose ERBench, which uses these integrity constraints to convert any database into an LLM benchmark.
arXiv Detail & Related papers (2024-03-08T12:42:36Z) - SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question
Answering over a Life Science Knowledge Graph [0.0]
We evaluate strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs.
We propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph.
We also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments.
arXiv Detail & Related papers (2024-02-07T07:24:01Z) - SEED-Bench-2: Benchmarking Multimodal Large Language Models [67.28089415198338]
Multimodal large language models (MLLMs) have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs.
SEED-Bench-2 comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions.
We evaluate the performance of 23 prominent open-source MLLMs and summarize valuable observations.
arXiv Detail & Related papers (2023-11-28T05:53:55Z) - An In-Context Schema Understanding Method for Knowledge Base Question
Answering [70.87993081445127]
Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task.
Existing methods bypass this challenge by initially employing LLMs to generate drafts of logic forms without schema-specific details.
We propose a simple In-Context Understanding (ICSU) method that enables LLMs to directly understand schemas by leveraging in-context learning.
arXiv Detail & Related papers (2023-10-22T04:19:17Z) - LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models [21.95962189710859]
We propose a lightweight, end-to-end framework to execute the Spoken Question Answering (SQA) task on the LibriSQA dataset.
By reforming ASR into the SQA format, we further substantiate our framework's capability in handling ASR tasks.
Our empirical findings bolster the LLMs' aptitude for aligning and comprehending multimodal information, paving the way for the development of universal multimodal LLMs.
arXiv Detail & Related papers (2023-08-20T23:47:23Z) - SPBERT: Pre-training BERT on SPARQL Queries for End-to-end Question
Answering over Knowledge Graphs [1.1775939485654976]
SPBERT is a Transformer-based language model pre-trained on massive SPARQL query logs.
We investigate how SPBERT and encoder-decoder architecture can be adapted for Knowledge-based QA corpora.
arXiv Detail & Related papers (2021-06-18T08:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.