What Does My QA Model Know? Devising Controlled Probes using Expert
Knowledge
- URL: http://arxiv.org/abs/1912.13337v2
- Date: Tue, 1 Sep 2020 22:24:44 GMT
- Title: What Does My QA Model Know? Devising Controlled Probes using Expert
Knowledge
- Authors: Kyle Richardson and Ashish Sabharwal
- Abstract summary: We investigate whether state-of-the-art QA models have general knowledge about word definitions and general taxonomic reasoning.
We use a methodology for automatically building datasets from various types of expert knowledge.
Our evaluation confirms that transformer-based QA models are already predisposed to recognize certain types of structural lexical knowledge.
- Score: 36.13528043657398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-domain question answering (QA) is known to involve several underlying
knowledge and reasoning challenges, but are models actually learning such
knowledge when trained on benchmark tasks? To investigate this, we introduce
several new challenge tasks that probe whether state-of-the-art QA models have
general knowledge about word definitions and general taxonomic reasoning, both
of which are fundamental to more complex forms of reasoning and are widespread
in benchmark datasets. As an alternative to expensive crowd-sourcing, we
introduce a methodology for automatically building datasets from various types
of expert knowledge (e.g., knowledge graphs and lexical taxonomies), allowing
for systematic control over the resulting probes and for a more comprehensive
evaluation. We find automatically constructing probes to be vulnerable to
annotation artifacts, which we carefully control for. Our evaluation confirms
that transformer-based QA models are already predisposed to recognize certain
types of structural lexical knowledge. However, it also reveals a more nuanced
picture: their performance degrades substantially with even a slight increase
in the number of hops in the underlying taxonomic hierarchy, or as more
challenging distractor candidate answers are introduced. Further, even when
these models succeed at the standard instance-level evaluation, they leave much
room for improvement when assessed at the level of clusters of semantically
connected probes (e.g., all Isa questions about a concept).
Related papers
- Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models [0.0]
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora.
High-quality datasets are used to train models on realistic scenarios.
Standardized metrics facilitate comparisons between different ODQA systems.
arXiv Detail & Related papers (2024-06-19T05:43:02Z) - Towards Better Generalization in Open-Domain Question Answering by Mitigating Context Memorization [67.92796510359595]
Open-domain Question Answering (OpenQA) aims at answering factual questions with an external large-scale knowledge corpus.
It is still unclear how well an OpenQA model can transfer to completely new knowledge domains.
We introduce Corpus-Invariant Tuning (CIT), a simple but effective training strategy, to mitigate the knowledge over-memorization.
arXiv Detail & Related papers (2024-04-02T05:44:50Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
Zero-Shot Commonsense Question Answering [48.25449258017601]
State-of-the-art approaches fine-tune language models on QA pairs constructed from CommonSense Knowledge Bases.
We propose QADYNAMICS, a training dynamics-driven framework for QA diagnostics and refinement.
arXiv Detail & Related papers (2023-10-17T14:27:34Z) - Attentive Q-Matrix Learning for Knowledge Tracing [4.863310073296471]
We propose Q-matrix-based Attentive Knowledge Tracing (QAKT) as an end-to-end style model.
QAKT is capable of modeling problems hierarchically and learning the q-matrix efficiently based on students' sequences.
Results of further experiments suggest that the q-matrix learned by QAKT is highly model-agnostic and more information-sufficient than the one labeled by human experts.
arXiv Detail & Related papers (2023-04-06T12:31:34Z) - ArT: All-round Thinker for Unsupervised Commonsense Question-Answering [54.068032948300655]
We propose an approach of All-round Thinker (ArT) by fully taking association during knowledge generating.
We evaluate it on three commonsense QA benchmarks: COPA, SocialIQA and SCT.
arXiv Detail & Related papers (2021-12-26T18:06:44Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - Self-supervised Knowledge Triplet Learning for Zero-shot Question
Answering [33.920269584939334]
We propose Knowledge Triplet Learning (KTL), a self-supervised task over knowledge graphs.
We propose methods of how to use KTL to perform zero-shot QA and our experiments show considerable improvements over large pre-trained transformer models.
arXiv Detail & Related papers (2020-05-01T11:24:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.