VANiLLa : Verbalized Answers in Natural Language at Large Scale
- URL: http://arxiv.org/abs/2105.11407v1
- Date: Mon, 24 May 2021 16:57:54 GMT
- Title: VANiLLa : Verbalized Answers in Natural Language at Large Scale
- Authors: Debanjali Biswas, Mohnish Dubey, Md Rashad Al Hasan Rony and Jens
Lehmann
- Abstract summary: This dataset consists of over 100k simple questions adapted from the CSQA and SimpleQuestionsWikidata datasets.
The answer sentences in this dataset are syntactically and semantically closer to the question than to the triple fact.
- Score: 2.9098477555578333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last years, there have been significant developments in the area of
Question Answering over Knowledge Graphs (KGQA). Despite all the notable
advancements, current KGQA datasets only provide the answers as the direct
output result of the formal query, rather than full sentences incorporating
question context. For achieving coherent answers sentence with the question's
vocabulary, template-based verbalization so are usually employed for a better
representation of answers, which in turn require extensive expert intervention.
Thus, making way for machine learning approaches; however, there is a scarcity
of datasets that empower machine learning models in this area. Hence, we
provide the VANiLLa dataset which aims at reducing this gap by offering answers
in natural language sentences. The answer sentences in this dataset are
syntactically and semantically closer to the question than to the triple fact.
Our dataset consists of over 100k simple questions adapted from the CSQA and
SimpleQuestionsWikidata datasets and generated using a semi-automatic
framework. We also present results of training our dataset on multiple baseline
models adapted from current state-of-the-art Natural Language Generation (NLG)
architectures. We believe that this dataset will allow researchers to focus on
finding suitable methodologies and architectures for answer verbalization.
Related papers
- Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering [58.17090503446995]
We focus on a conversational question answering task which combines the challenges of understanding questions in context and reasoning over evidence gathered from heterogeneous sources like text, knowledge graphs, tables, and infoboxes.
Our method utilizes a graph structured representation to aggregate information about a question and its context.
arXiv Detail & Related papers (2024-06-14T13:28:03Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Prompting-based Synthetic Data Generation for Few-Shot Question Answering [23.97949073816028]
We show that using large language models can improve Question Answering performance on various datasets in the few-shot setting.
We suggest that language models contain valuable task-agnostic knowledge that can be used beyond the common pre-training/fine-tuning scheme.
arXiv Detail & Related papers (2024-05-15T13:36:43Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - An Answer Verbalization Dataset for Conversational Question Answerings
over Knowledge Graphs [9.979689965471428]
This paper contributes to the state-of-the-art by extending an existing ConvQA dataset with verbalized answers.
We perform experiments with five sequence-to-sequence models on generating answer responses while maintaining grammatical correctness.
arXiv Detail & Related papers (2022-08-13T21:21:28Z) - Would You Ask it that Way? Measuring and Improving Question Naturalness
for Knowledge Graph Question Answering [20.779777536841493]
Knowledge graph question answering (KGQA) facilitates information access by leveraging structured data without requiring formal query language expertise from the user.
We create the IQN-KGQA test collection by sampling questions from existing KGQA datasets and evaluating them with regards to five different aspects of naturalness.
We find that some KGQA systems fare worse when presented with more realistic formulations of NL questions.
arXiv Detail & Related papers (2022-05-25T13:32:27Z) - ListReader: Extracting List-form Answers for Opinion Questions [18.50111430378249]
ListReader is a neural ex-tractive QA model for list-form answer.
In addition to learning the alignment between the question and content, we introduce a heterogeneous graph neural network.
Our model adopts a co-extraction setting that can extract either span- or sentence-level answers.
arXiv Detail & Related papers (2021-10-22T10:33:08Z) - PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge
Graph [0.0]
This paper introduces textitPeCoQ, a dataset for Persian question answering.
This dataset contains 10,000 complex questions and answers extracted from the Persian knowledge graph, FarsBase.
There are different types of complexities in the dataset, such as multi-relation, multi-entity, ordinal, and temporal constraints.
arXiv Detail & Related papers (2021-06-27T08:21:23Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.