Automating SPARQL Query Translations between DBpedia and Wikidata
- URL: http://arxiv.org/abs/2507.10045v1
- Date: Mon, 14 Jul 2025 08:23:25 GMT
- Title: Automating SPARQL Query Translations between DBpedia and Wikidata
- Authors: Malte Christian Bartels, Debayan Banerjee, Ricardo Usbeck,
- Abstract summary: We focus on translations between the DBpedia and Wikidata KG, and later on DBLP and OpenAlex KG.<n>We find that the performance varies markedly across models and prompting strategies.
- Score: 8.105516788827453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates whether state-of-the-art Large Language Models (LLMs) can automatically translate SPARQL between popular Knowledge Graph (KG) schemas. We focus on translations between the DBpedia and Wikidata KG, and later on DBLP and OpenAlex KG. This study addresses a notable gap in KG interoperability research by rigorously evaluating LLM performance on SPARQL-to-SPARQL translation. Two benchmarks are assembled, where the first align 100 DBpedia-Wikidata queries from QALD-9-Plus; the second contains 100 DBLP queries aligned to OpenAlex, testing generalizability beyond encyclopaedic KGs. Three open LLMs: Llama-3-8B, DeepSeek-R1-Distill-Llama-70B, and Mistral-Large-Instruct-2407 are selected based on their sizes and architectures and tested with zero-shot, few-shot, and two chain-of-thought variants. Outputs were compared with gold answers, and resulting errors were categorized. We find that the performance varies markedly across models and prompting strategies, and that translations for Wikidata to DBpedia work far better than translations for DBpedia to Wikidata.
Related papers
- Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning [51.203811759364925]
mKGQAgent breaks down the task of converting natural language questions into SPARQL queries into modular, interpretable subtasks.<n> Evaluated on the DBpedia- and Corporate-based KGQA benchmarks within the Text2SPARQL challenge 2025, our approach took first place among the other participants.
arXiv Detail & Related papers (2025-07-22T19:23:03Z) - The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Integrating SPARQL and LLMs for Question Answering over Scholarly Data Sources [0.0]
This paper describes a methodology that combines SPARQL queries, divide and conquer algorithms, and a pre-trained extractive question answering model.<n>It starts with SPARQL queries to gather data, then applies divide and conquer to manage various question types and sources, and uses the model to handle personal author questions.<n>The approach, evaluated with Exact Match and F-score metrics, shows promise for improving QA accuracy and efficiency in scholarly contexts.
arXiv Detail & Related papers (2024-09-11T14:50:28Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - An In-Context Schema Understanding Method for Knowledge Base Question
Answering [70.87993081445127]
Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task.
Existing methods bypass this challenge by initially employing LLMs to generate drafts of logic forms without schema-specific details.
We propose a simple In-Context Understanding (ICSU) method that enables LLMs to directly understand schemas by leveraging in-context learning.
arXiv Detail & Related papers (2023-10-22T04:19:17Z) - Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot
Sequence-to-Sequence Semantic Parsing over Wikidata [6.716263690738313]
This paper presents WikiWebQuestions, a high-quality question answering benchmark for Wikidata.
It consists of real-world data with SPARQL.
We modify SPARQL to use the unique domain and property names instead of their IDs.
arXiv Detail & Related papers (2023-05-23T16:20:43Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia
and Wikidata Translated by Native Speakers [68.9964449363406]
We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages.
Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
arXiv Detail & Related papers (2022-01-31T22:19:55Z) - Reducing the impact of out of vocabulary words in the translation of
natural language questions into SPARQL queries [5.97507595130844]
Automatic translation of questions posed in natural language in SPARQL has the potential of overcoming this problem.
Existing systems based on neural-machine translation are very effective but easily fail in recognizing words that are Out Of The Vocabulary (OOV) of the training set.
arXiv Detail & Related papers (2021-11-04T16:53:59Z) - SPARQLing Database Queries from Intermediate Question Decompositions [7.475027071883912]
To translate natural language questions into database queries, most approaches rely on a fully annotated training set.
We reduce this burden using grounded in databases intermediate question representations.
Our pipeline consists of two parts: a semantic that converts natural language questions into the intermediate representations and a non-trainable transpiler to the QLSPAR query language.
arXiv Detail & Related papers (2021-09-13T17:57:12Z) - Creating and Querying Personalized Versions of Wikidata on a Laptop [0.7449724123186383]
This paper introduces KGTK Kypher, a query language and processor that allows users to create personalized variants of Wikidata on a laptop.
We present several use cases that illustrate the types of analyses that Kypher enables users to run on the full Wikidata KG on a laptop.
arXiv Detail & Related papers (2021-08-06T00:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.