The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems
- URL: http://arxiv.org/abs/2212.08192v2
- Date: Mon, 22 May 2023 21:17:52 GMT
- Title: The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems
- Authors: Akshatha Arodi, Martin P\"omsl, Kaheer Suleman, Adam Trischler,
Alexandra Olteanu, Jackie Chi Kit Cheung
- Abstract summary: We evaluate state-of-the-art coreference resolution models on our dataset.
Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time.
Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
- Score: 87.3207729953778
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many state-of-the-art natural language understanding (NLU) models are based
on pretrained neural language models. These models often make inferences using
information from multiple sources. An important class of such inferences are
those that require both background knowledge, presumably contained in a model's
pretrained parameters, and instance-specific information that is supplied at
inference time. However, the integration and reasoning abilities of NLU models
in the presence of multiple knowledge sources have been largely understudied.
In this work, we propose a test suite of coreference resolution subtasks that
require reasoning over multiple facts. These subtasks differ in terms of which
knowledge sources contain the relevant facts. We also introduce subtasks where
knowledge is present only at inference time using fictional knowledge. We
evaluate state-of-the-art coreference resolution models on our dataset. Our
results indicate that several models struggle to reason on-the-fly over
knowledge observed both at pretrain time and at inference time. However, with
task-specific training, a subset of models demonstrates the ability to
integrate certain knowledge types from multiple sources. Still, even the best
performing models seem to have difficulties with reliably integrating knowledge
presented only at inference time.
Related papers
- Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning.
This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z) - Context versus Prior Knowledge in Language Models [49.17879668110546]
Language models often need to integrate prior knowledge learned during pretraining and new information presented in context.
We propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity.
arXiv Detail & Related papers (2024-04-06T13:46:53Z) - A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation [51.31429493814664]
We present a benchmark named multi-source Wizard of Wikipedia for evaluating multi-source dialogue knowledge selection and response generation.
We propose a new challenge, dialogue knowledge plug-and-play, which aims to test an already trained dialogue model on using new support knowledge from previously unseen sources.
arXiv Detail & Related papers (2024-03-06T06:54:02Z) - DisentQA: Disentangling Parametric and Contextual Knowledge with
Counterfactual Question Answering [34.70206857546496]
Question answering models commonly have access to two sources of "knowledge" during inference time.
It is unclear whether the answer stems from the given non-parametric knowledge or not.
We propose a new paradigm in which QA models are trained to disentangle the two sources of knowledge.
arXiv Detail & Related papers (2022-11-10T15:34:44Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating
Models to Reflect Conflicting Evidence [37.18100697469402]
We simulate knowledge conflicts where parametric knowledge suggests one answer and different passages suggest different answers.
We find retrieval performance heavily impacts which sources models rely on, and current models mostly rely on non-performing knowledge.
We present a new calibration study, where models are discouraged from presenting any single answer when presented with multiple conflicting answer candidates.
arXiv Detail & Related papers (2022-10-25T01:46:00Z) - KgPLM: Knowledge-guided Language Model Pre-training via Generative and
Discriminative Learning [45.067001062192844]
We present a language model pre-training framework guided by factual knowledge completion and verification.
Experimental results on LAMA, a set of zero-shot cloze-style question answering tasks, show that our model contains richer factual knowledge than the conventional pre-trained language models.
arXiv Detail & Related papers (2020-12-07T09:39:25Z) - Knowledge-driven Data Construction for Zero-shot Evaluation in
Commonsense Question Answering [80.60605604261416]
We propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks.
We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks.
We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.
arXiv Detail & Related papers (2020-11-07T22:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.