Related papers: The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

URL: http://arxiv.org/abs/2305.15108v1
Date: Wed, 24 May 2023 12:55:04 GMT
Title: The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing
Authors: Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann
Abstract summary: We analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset.
Score: 20.734859343886843
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is distinct from human vocabulary. Language Models (LMs) are pre-dominantly trained for human language tasks, and hence, if the query vocabulary is replaced with a vocabulary more attuned to the LM tokenizer, the performance of models may improve. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset.

Related papers

Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning [51.203811759364925]
mKGQAgent breaks down the task of converting natural language questions into SPARQL queries into modular, interpretable subtasks.<n> Evaluated on the DBpedia- and Corporate-based KGQA benchmarks within the Text2SPARQL challenge 2025, our approach took first place among the other participants.
arXiv Detail & Related papers (2025-07-22T19:23:03Z)
Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling [69.84963245729826]
Large language models (LLMs) have shown compelling semantic understanding capabilities. Dense retrieval is a crucial task in Information Retrieval (IR) and is the foundation for downstream tasks as re-ranking. We introduce an auxiliary task of QL estimation to yield a better backbone for contrast learning a discriminative retriever.
arXiv Detail & Related papers (2025-04-07T16:03:59Z)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
Semantic Parsing for Conversational Question Answering over Knowledge Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task. Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z)
XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks. This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query. We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z)
Improving Text-to-SQL Semantic Parsing with Fine-grained Query Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding. Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z)
Modern Baselines for SPARQL Semantic Parsing [28.088516108293653]
We focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs. We show that T5 requires special input tokenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets. The methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.
arXiv Detail & Related papers (2022-04-27T09:26:59Z)
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages. Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z)
Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries [5.97507595130844]
Automatic translation of questions posed in natural language in SPARQL has the potential of overcoming this problem. Existing systems based on neural-machine translation are very effective but easily fail in recognizing words that are Out Of The Vocabulary (OOV) of the training set.
arXiv Detail & Related papers (2021-11-04T16:53:59Z)
SPARQLing Database Queries from Intermediate Question Decompositions [7.475027071883912]
To translate natural language questions into database queries, most approaches rely on a fully annotated training set. We reduce this burden using grounded in databases intermediate question representations. Our pipeline consists of two parts: a semantic that converts natural language questions into the intermediate representations and a non-trainable transpiler to the QLSPAR query language.
arXiv Detail & Related papers (2021-09-13T17:57:12Z)
SPBERT: Pre-training BERT on SPARQL Queries for End-to-end Question Answering over Knowledge Graphs [1.1775939485654976]
SPBERT is a Transformer-based language model pre-trained on massive SPARQL query logs. We investigate how SPBERT and encoder-decoder architecture can be adapted for Knowledge-based QA corpora.
arXiv Detail & Related papers (2021-06-18T08:39:26Z)
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions [87.33156149634392]
We critically examine RefCOg, a standard benchmark for visual referring expression recognition. We show that 83.7% of test instances do not require reasoning on linguistic structure. We propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT.
arXiv Detail & Related papers (2020-05-04T17:09:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.