From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM
- URL: http://arxiv.org/abs/2507.03868v1
- Date: Sat, 05 Jul 2025 02:44:38 GMT
- Title: From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM
- Authors: Xinyi Wu, Yanhao Jia, Luwei Xiao, Shuai Zhao, Fengkuang Chiang, Erik Cambria,
- Abstract summary: We develop a lightweight, efficient multi-modal retrieval module named Uni-Retrieval.<n>It extracts query-style prototypes and dynamically matches them with tokens from a continually updated Prompt Bank.<n>This Prompt Bank encodes and stores domain-specific knowledge by leveraging a Mixture-of-Expert Low-Rank Adaptation (MoE-LoRA) module.<n>We integrate the original Uni-Retrieval with a compact instruction-tuned language model, forming a complete retrieval-augmented generation pipeline named Uni-RAG.
- Score: 35.20687923222239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambiguity inherent in real-world educational scenarios. To address this limitation, we develop a lightweight and efficient multi-modal retrieval module, named Uni-Retrieval, which extracts query-style prototypes and dynamically matches them with tokens from a continually updated Prompt Bank. This Prompt Bank encodes and stores domain-specific knowledge by leveraging a Mixture-of-Expert Low-Rank Adaptation (MoE-LoRA) module and can be adapted to enhance Uni-Retrieval's capability to accommodate unseen query types at test time. To enable natural language educational content generation, we integrate the original Uni-Retrieval with a compact instruction-tuned language model, forming a complete retrieval-augmented generation pipeline named Uni-RAG. Given a style-conditioned query, Uni-RAG first retrieves relevant educational materials and then generates human-readable explanations, feedback, or instructional content aligned with the learning objective. Experimental results on SER and other multi-modal benchmarks show that Uni-RAG outperforms baseline retrieval and RAG systems in both retrieval accuracy and generation quality, while maintaining low computational cost. Our framework provides a scalable, pedagogically grounded solution for intelligent educational systems, bridging retrieval and generation to support personalized, explainable, and efficient learning assistance across diverse STEM scenarios.
Related papers
- Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering [3.6799953119508735]
We propose a novel approach to multimodal textbook question answering by introducing a mechanism for enhancing semantic representations.<n>Our model, Joint Embedding Training With Ranking Supervision for Textbook Question Answering (JETRTQA), is a multimodal learning framework built on a retriever-generator architecture.<n>We evaluate our method on the CK12-QA dataset and demonstrate that it significantly improves the discrimination between informative and irrelevant documents.
arXiv Detail & Related papers (2025-05-17T13:23:54Z) - UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities [53.76854299076118]
UniversalRAG is a novel RAG framework designed to retrieve and integrate knowledge from heterogeneous sources with diverse modalities and granularities.<n>We propose a modality-aware routing mechanism that dynamically identifies the most appropriate modality-specific corpus and performs targeted retrieval within it.<n>We validate UniversalRAG on 8 benchmarks spanning multiple modalities, showing its superiority over various modality-specific and unified baselines.
arXiv Detail & Related papers (2025-04-29T13:18:58Z) - Uni-Retrieval: A Multi-Style Retrieval Framework for STEM's Education [30.071212702797016]
In AI-facilitated teaching, leveraging various query styles to interpret abstract text descriptions is crucial for ensuring high-quality teaching.<n>In this paper, we propose a diverse expression retrieval task tailored to educational scenarios, supporting retrieval based on multiple query styles and expressions.<n>We introduce the STEM Education Retrieval dataset (SER), which contains over 24,000 query pairs of different styles, and the Uni-Retrieval, an efficient and style-diversified retrieval vision-language model based on prompt tuning.
arXiv Detail & Related papers (2025-02-09T11:46:05Z) - Enhancing Retrieval-Augmented Generation: A Study of Best Practices [16.246719783032436]
We develop advanced RAG system designs that incorporate query expansion, various novel retrieval strategies, and a novel Contrastive In-Context Learning RAG.<n>Our study systematically investigates key factors, including language model size, prompt design, document chunk size, knowledge base size, retrieval stride, query expansion techniques, and Focus Mode retrieving relevant context at sentence-level.<n>Our findings offer actionable insights for developing RAG systems, striking a balance between contextual richness and retrieval-generation efficiency.
arXiv Detail & Related papers (2025-01-13T15:07:55Z) - A Proposed Large Language Model-Based Smart Search for Archive System [0.0]
This study presents a novel framework for smart search in digital archival systems.<n>By employing a Retrieval-Augmented Generation (RAG) approach, the framework enables the processing of natural language queries.<n>We present the architecture and implementation of the system and evaluate its performance in four experiments.
arXiv Detail & Related papers (2025-01-13T02:53:07Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning [49.3242278912771]
We introduce a novel multimodal RAG framework named RMR (Retrieval Meets Reasoning)
The RMR framework employs a bi-modal retrieval module to identify the most relevant question-answer pairs.
It significantly boosts the performance of various vision-language models across a spectrum of benchmark datasets.
arXiv Detail & Related papers (2024-05-31T14:23:49Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases.
Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM)
All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts.
This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.