Related papers: Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries

Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries

URL: http://arxiv.org/abs/2111.03000v1
Date: Thu, 4 Nov 2021 16:53:59 GMT
Title: Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries
Authors: Manuel A. Borroto Santana, Francesco Ricca, Bernardo Cuteri
Abstract summary: Automatic translation of questions posed in natural language in SPARQL has the potential of overcoming this problem. Existing systems based on neural-machine translation are very effective but easily fail in recognizing words that are Out Of The Vocabulary (OOV) of the training set.
Score: 5.97507595130844
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accessing the large volumes of information available in public knowledge bases might be complicated for those users unfamiliar with the SPARQL query language. Automatic translation of questions posed in natural language in SPARQL has the potential of overcoming this problem. Existing systems based on neural-machine translation are very effective but easily fail in recognizing words that are Out Of the Vocabulary (OOV) of the training set. This is a serious issue while querying large ontologies. In this paper, we combine Named Entity Linking, Named Entity Recognition, and Neural Machine Translation to perform automatic translation of natural language questions into SPARQL queries. We demonstrate empirically that our approach is more effective and resilient to OOV words than existing approaches by running the experiments on Monument, QALD-9, and LC-QuAD v1, which are well-known datasets for Question Answering over DBpedia.

Related papers

Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning [51.203811759364925]
mKGQAgent breaks down the task of converting natural language questions into SPARQL queries into modular, interpretable subtasks.<n> Evaluated on the DBpedia- and Corporate-based KGQA benchmarks within the Text2SPARQL challenge 2025, our approach took first place among the other participants.
arXiv Detail & Related papers (2025-07-22T19:23:03Z)
SPARQL Query Generation with LLMs: Measuring the Impact of Training Data Memorization and Knowledge Injection [81.78173888579941]
Large Language Models (LLMs) are considered a well-suited method to increase the quality of the question-answering functionality.<n>LLMs are trained on web data, where researchers have no control over whether the benchmark or the knowledge graph was already included in the training data.<n>This paper introduces a novel method that evaluates the quality of LLMs by generating a SPARQL query from a natural-language question.
arXiv Detail & Related papers (2025-07-18T12:28:08Z)
MST5 -- Multilingual Question Answering over Knowledge Graphs [1.6470999044938401]
Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language. Existing multilingual KGQA systems face challenges in achieving performance comparable to English systems. We propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model.
arXiv Detail & Related papers (2024-07-08T15:37:51Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
An In-Context Schema Understanding Method for Knowledge Base Question Answering [70.87993081445127]
Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task. Existing methods bypass this challenge by initially employing LLMs to generate drafts of logic forms without schema-specific details. We propose a simple In-Context Understanding (ICSU) method that enables LLMs to directly understand schemas by leveraging in-context learning.
arXiv Detail & Related papers (2023-10-22T04:19:17Z)
The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing [20.734859343886843]
We analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset.
arXiv Detail & Related papers (2023-05-24T12:55:04Z)
Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES. Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query. By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z)
Cross-Lingual Question Answering over Knowledge Base as Reading Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base. One of the major challenges facing xKBQA is the high cost of data annotation. We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z)
Semantic Parsing for Conversational Question Answering over Knowledge Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task. Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z)
AutoQGS: Auto-Prompt for Low-Resource Knowledge-based Question Generation from SPARQL [18.019353543946913]
This study investigates the task of knowledge-based question generation (KBQG) Conventional KBQG works generated questions from fact triples in the knowledge graph, which could not express complex operations like aggregation and comparison in SPARQL. We propose an auto-prompter trained on large-scale unsupervised data to rephrase SPARQL into NL description.
arXiv Detail & Related papers (2022-08-26T06:53:46Z)
QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers [68.9964449363406]
We extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before.
arXiv Detail & Related papers (2022-01-31T22:19:55Z)
SPARQLing Database Queries from Intermediate Question Decompositions [7.475027071883912]
To translate natural language questions into database queries, most approaches rely on a fully annotated training set. We reduce this burden using grounded in databases intermediate question representations. Our pipeline consists of two parts: a semantic that converts natural language questions into the intermediate representations and a non-trainable transpiler to the QLSPAR query language.
arXiv Detail & Related papers (2021-09-13T17:57:12Z)
SPBERT: Pre-training BERT on SPARQL Queries for End-to-end Question Answering over Knowledge Graphs [1.1775939485654976]
SPBERT is a Transformer-based language model pre-trained on massive SPARQL query logs. We investigate how SPBERT and encoder-decoder architecture can be adapted for Knowledge-based QA corpora.
arXiv Detail & Related papers (2021-06-18T08:39:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.