From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question Answering
- URL: http://arxiv.org/abs/2412.17701v2
- Date: Tue, 24 Dec 2024 03:23:24 GMT
- Title: From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question Answering
- Authors: Nathaniel Weir, Bhavana Dalvi Mishra, Orion Weller, Oyvind Tafjord, Sam Hornstein, Alexander Sabol, Peter Jansen, Benjamin Van Durme, Peter Clark,
- Abstract summary: microtheories are sentences encapsulating an LM's core knowledge about a topic.
We show that, when added to a general corpus (e.g., Wikipedia), microtheories can supply critical, topical information not necessarily present in the corpus.
We also show that, in a human evaluation in the medical domain, our distilled microtheories contain a significantly higher concentration of topically critical facts.
- Score: 86.36792996924244
- License:
- Abstract: Recent reasoning methods (e.g., chain-of-thought, entailment reasoning) help users understand how language models (LMs) answer a single question, but they do little to reveal the LM's overall understanding, or "theory," about the question's topic, making it still hard to trust the model. Our goal is to materialize such theories - here called microtheories (a linguistic analog of logical microtheories) - as a set of sentences encapsulating an LM's core knowledge about a topic. These statements systematically work together to entail answers to a set of questions to both engender trust and improve performance. Our approach is to first populate a knowledge store with (model-generated) sentences that entail answers to training questions and then distill those down to a core microtheory that is concise, general, and non-redundant. We show that, when added to a general corpus (e.g., Wikipedia), microtheories can supply critical, topical information not necessarily present in the corpus, improving both a model's ability to ground its answers to verifiable knowledge (i.e., show how answers are systematically entailed by documents in the corpus, fully grounding up to +8% more answers), and the accuracy of those grounded answers (up to +8% absolute). We also show that, in a human evaluation in the medical domain, our distilled microtheories contain a significantly higher concentration of topically critical facts than the non-distilled knowledge store. Finally, we show we can quantify the coverage of a microtheory for a topic (characterized by a dataset) using a notion of $p$-relevance. Together, these suggest that microtheories are an efficient distillation of an LM's topic-relevant knowledge, that they can usefully augment existing corpora, and can provide both performance gains and an interpretable, verifiable window into the model's knowledge of a topic.
Related papers
- What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on
Curiosity-Driven Questioning [4.3512163406552]
Large language models (LLMs) can store a massive amount of knowledge, yet their potential to acquire new knowledge remains unknown.
We propose a novel evaluation framework that evaluates this capability.
We find that while large models like GPT-4 and Mistral 8x7b are adept at generating coherent and relevant questions, the smaller Phi-2 model is equally or more effective.
arXiv Detail & Related papers (2024-09-19T22:12:16Z) - RECKONING: Reasoning through Dynamic Knowledge Encoding [51.076603338764706]
We show that language models can answer questions by reasoning over knowledge provided as part of the context.
In these situations, the model fails to distinguish the knowledge that is necessary to answer the question.
We propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters.
arXiv Detail & Related papers (2023-05-10T17:54:51Z) - WikiWhy: Answering and Explaining Cause-and-Effect Questions [62.60993594814305]
We introduce WikiWhy, a QA dataset built around explaining why an answer is true in natural language.
WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics.
GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition.
arXiv Detail & Related papers (2022-10-21T17:59:03Z) - Unpacking Large Language Models with Conceptual Consistency [14.224799628694592]
We propose conceptual consistency to measure a Large Language Model's understanding of relevant concepts.
This novel metric measures how well a model can be characterized by finding out how consistent its responses to queries about conceptually relevant background knowledge are.
arXiv Detail & Related papers (2022-09-29T20:55:57Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning [59.16962123636579]
This paper proposes a new take on Prolog-based inference engines.
We replace handcrafted rules with a combination of neural language modeling, guided generation, and semi dense retrieval.
Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA.
arXiv Detail & Related papers (2022-09-16T00:54:44Z) - GreaseLM: Graph REASoning Enhanced Language Models for Question
Answering [159.9645181522436]
GreaseLM is a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations.
We show that GreaseLM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8x larger.
arXiv Detail & Related papers (2022-01-21T19:00:05Z) - Grow-and-Clip: Informative-yet-Concise Evidence Distillation for Answer
Explanation [22.20733260041759]
We argue that the evidences of an answer is critical to enhancing the interpretability of QA models.
We are the first to explicitly define the concept of evidence as the supporting facts in a context which are informative, concise, and readable.
We propose Grow-and-Clip Evidence Distillation (GCED) algorithm to extract evidences from the contexts by trade-off informativeness, conciseness, and readability.
arXiv Detail & Related papers (2022-01-13T17:18:17Z) - MMGET: A Markov model for generalized evidence theory [0.0]
Dempster-Shafer evidence theory is a useful tool in managing uncertain information.
Everything occurs in sequence and owns some underlying relationships with each other.
A Markov model is introduced into the generalized evidence theory which helps extract complete information volume.
arXiv Detail & Related papers (2021-05-12T12:41:57Z) - Explaining Question Answering Models through Text Generation [42.36596190720944]
Large pre-trained language models (LMs) have been shown to perform surprisingly well when fine-tuned on tasks that require commonsense and world knowledge.
It is difficult to explain what is the knowledge in the LM that allows it to make a correct prediction in end-to-end architectures.
We show on several tasks that our model reaches performance that is comparable to end-to-end architectures.
arXiv Detail & Related papers (2020-04-12T09:06:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.