Less is More: Data-Efficient Complex Question Answering over Knowledge
Bases
- URL: http://arxiv.org/abs/2010.15881v1
- Date: Thu, 29 Oct 2020 18:42:44 GMT
- Title: Less is More: Data-Efficient Complex Question Answering over Knowledge
Bases
- Authors: Yuncheng Hua, Yuan-Fang Li, Guilin Qi, Wei Wu, Jingyao Zhang, Daiqing
Qi
- Abstract summary: We propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering.
Our framework consists of a neural generator and a symbolic executor that transforms a natural-language question into a sequence of primitive actions.
Our model is evaluated on two datasets: CQA, a recent large-scale complex question answering dataset, and WebQuestionsSP, a multi-hop question answering dataset.
- Score: 26.026065844896465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question answering is an effective method for obtaining information from
knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex
Question Answering (NS-CQA) model, a data-efficient reinforcement learning
framework for complex question answering by using only a modest number of
training samples. Our framework consists of a neural generator and a symbolic
executor that, respectively, transforms a natural-language question into a
sequence of primitive actions, and executes them over the knowledge base to
compute the answer. We carefully formulate a set of primitive symbolic actions
that allows us to not only simplify our neural network design but also
accelerate model convergence. To reduce search space, we employ the copy and
masking mechanisms in our encoder-decoder architecture to drastically reduce
the decoder output vocabulary and improve model generalizability. We equip our
model with a memory buffer that stores high-reward promising programs. Besides,
we propose an adaptive reward function. By comparing the generated trial with
the trials stored in the memory buffer, we derive the curriculum-guided reward
bonus, i.e., the proximity and the novelty. To mitigate the sparse reward
problem, we combine the adaptive reward and the reward bonus, reshaping the
sparse reward into dense feedback. Also, we encourage the model to generate new
trials to avoid imitating the spurious trials while making the model remember
the past high-reward trials to improve data efficiency. Our NS-CQA model is
evaluated on two datasets: CQA, a recent large-scale complex question answering
dataset, and WebQuestionsSP, a multi-hop question answering dataset. On both
datasets, our model outperforms the state-of-the-art models. Notably, on CQA,
NS-CQA performs well on questions with higher complexity, while only using
approximately 1% of the total training samples.
Related papers
- Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering [2.98667511228225]
ReRe is an encoder-decoder architecture model using a pre-trained clip vision encoder and a pre-trained GPT-2 language model as a decoder.
ReRe outperforms previous methods in VQA accuracy and explanation score and shows improvement in NLE with more persuasive, reliability.
arXiv Detail & Related papers (2024-08-30T04:39:43Z) - Learning Better Representations From Less Data For Propositional Satisfiability [7.449724123186386]
We present NeuRes, a neuro-symbolic approach to address both challenges for propositional satisfiability.
Our model learns better representations than models trained for classification only, with a much higher data efficiency.
We show that our model achieves far better performance than NeuroSAT in terms of both correctly classified and proven instances.
arXiv Detail & Related papers (2024-02-13T10:50:54Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - QASnowball: An Iterative Bootstrapping Framework for High-Quality
Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball)
QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples.
We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - Adapting Neural Link Predictors for Data-Efficient Complex Query
Answering [45.961111441411084]
We propose a parameter-efficient score emphadaptation model optimised to re-calibrate neural link prediction scores for the complex query answering task.
CQD$mathcalA$ produces significantly more accurate results than current state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T00:17:16Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.