KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education?
- URL: http://arxiv.org/abs/2412.08985v4
- Date: Mon, 21 Jul 2025 16:16:56 GMT
- Title: KnowShiftQA: How Robust are RAG Systems when Textbook Knowledge Shifts in K-12 Education?
- Authors: Tianshi Zheng, Weihan Li, Jiaxin Bai, Weiqi Wang, Yangqiu Song,
- Abstract summary: Retrieval-Augmented Generation (RAG) systems show remarkable potential as question answering tools in the K-12 Education domain.<n> discrepancies between these textbooks and the parametric knowledge inherent in Large Language Models (LLMs) can undermine the effectiveness of RAG systems.<n>KnowShiftQA simulates these discrepancies by applying deliberate hypothetical knowledge updates to both answers and source documents.
- Score: 41.49674849980441
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) systems show remarkable potential as question answering tools in the K-12 Education domain, where knowledge is typically queried within the restricted scope of authoritative textbooks. However, discrepancies between these textbooks and the parametric knowledge inherent in Large Language Models (LLMs) can undermine the effectiveness of RAG systems. To systematically investigate RAG system robustness against such knowledge discrepancies, we introduce KnowShiftQA. This novel question answering dataset simulates these discrepancies by applying deliberate hypothetical knowledge updates to both answers and source documents, reflecting how textbook knowledge can shift. KnowShiftQA comprises 3,005 questions across five subjects, designed with a comprehensive question typology focusing on context utilization and knowledge integration. Our extensive experiments on retrieval and question answering performance reveal that most RAG systems suffer a substantial performance drop when faced with these knowledge discrepancies. Furthermore, questions requiring the integration of contextual (textbook) knowledge with parametric (LLM) knowledge pose a significant challenge to current LLMs.
Related papers
- Prompting Large Language Models with Partial Knowledge for Answering Questions with Unseen Entities [43.88784275673178]
Retrieval-Augmented Generation (RAG) shows impressive performance by supplementing and substituting parametric knowledge in Large Language Models (LLMs)<n>We show how triplets located in the gold reasoning path and their variants are used to construct partially relevant knowledge by removing the path that contains the answer.<n>Our awakening-based approach demonstrates greater efficacy in practical applications, outperforms traditional methods that rely on embedding-based similarity.
arXiv Detail & Related papers (2025-08-02T09:54:46Z) - A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task [15.932332484902103]
Knowledge-based Vision Question Answering (KB-VQA) extends general Vision Question Answering (VQA)
No comprehensive survey currently exists that systematically organizes and reviews the existing KB-VQA methods.
arXiv Detail & Related papers (2025-04-24T13:37:25Z) - Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation [19.543102037001134]
Language models (LMs) are known to suffer from hallucinations and misinformation.
Retrieval augmented generation (RAG) that retrieves verifiable information from an external knowledge corpus provides a tangible solution to these problems.
RAG generation quality is highly dependent on the relevance between a user's query and the retrieved documents.
arXiv Detail & Related papers (2024-10-10T19:14:55Z) - Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation [11.471919529192048]
Large Language Models (LLMs) are proficient at generating coherent and contextually relevant text.
Retrieval-augmented generation (RAG) systems mitigate this by incorporating external knowledge sources, such as structured knowledge graphs (KGs)
Our study investigates this dilemma by analyzing error patterns in existing KG-based RAG methods and identifying eight critical failure points.
arXiv Detail & Related papers (2024-07-16T23:50:07Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - Beyond Factuality: A Comprehensive Evaluation of Large Language Models
as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks.
However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge.
We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z) - Rainier: Reinforced Knowledge Introspector for Commonsense Question
Answering [74.90418840431425]
We present Rainier, or Reinforced Knowledge Introspector, that learns to generate contextually relevant knowledge in response to given questions.
Our approach starts by imitating knowledge generated by GPT-3, then learns to generate its own knowledge via reinforcement learning.
Our work is the first to report that knowledge generated by models that are orders of magnitude smaller than GPT-3, even without direct supervision on the knowledge itself, can exceed the quality of knowledge elicited from GPT-3 for commonsense QA.
arXiv Detail & Related papers (2022-10-06T17:34:06Z) - A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA [67.75989848202343]
This paper presents a unified end-to-end retriever-reader framework towards knowledge-based VQA.
We shed light on the multi-modal implicit knowledge from vision-language pre-training models to mine its potential in knowledge reasoning.
Our scheme is able to not only provide guidance for knowledge retrieval, but also drop these instances potentially error-prone towards question answering.
arXiv Detail & Related papers (2022-06-30T02:35:04Z) - Contextualized Knowledge-aware Attentive Neural Network: Enhancing
Answer Selection with Knowledge [77.77684299758494]
We extensively investigate approaches to enhancing the answer selection model with external knowledge from knowledge graph (KG)
First, we present a context-knowledge interaction learning framework, Knowledge-aware Neural Network (KNN), which learns the QA sentence representations by considering a tight interaction with the external knowledge from KG and the textual information.
To handle the diversity and complexity of KG information, we propose a Contextualized Knowledge-aware Attentive Neural Network (CKANN), which improves the knowledge representation learning with structure information via a customized Graph Convolutional Network (GCN) and comprehensively learns context-based and knowledge-based sentence representation via
arXiv Detail & Related papers (2021-04-12T05:52:20Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.