URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding
- URL: http://arxiv.org/abs/2511.10552v1
- Date: Fri, 14 Nov 2025 01:57:33 GMT
- Title: URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding
- Authors: Yongxin Shi, Jiapeng Wang, Zeyu Shan, Dezhi Peng, Zening Lin, Lianwen Jin,
- Abstract summary: We present URaG, a framework that Unifies Retrieval and Generation within a single MLLM.<n>We show that URaG achieves state-of-the-art performance while reducing computational overhead by 44-56%.
- Score: 55.45331924836242
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent multimodal large language models (MLLMs) still struggle with long document understanding due to two fundamental challenges: information interference from abundant irrelevant content, and the quadratic computational cost of Transformer-based architectures. Existing approaches primarily fall into two categories: token compression, which sacrifices fine-grained details; and introducing external retrievers, which increase system complexity and prevent end-to-end optimization. To address these issues, we conduct an in-depth analysis and observe that MLLMs exhibit a human-like coarse-to-fine reasoning pattern: early Transformer layers attend broadly across the document, while deeper layers focus on relevant evidence pages. Motivated by this insight, we posit that the inherent evidence localization capabilities of MLLMs can be explicitly leveraged to perform retrieval during the reasoning process, facilitating efficient long document understanding. To this end, we propose URaG, a simple-yet-effective framework that Unifies Retrieval and Generation within a single MLLM. URaG introduces a lightweight cross-modal retrieval module that converts the early Transformer layers into an efficient evidence selector, identifying and preserving the most relevant pages while discarding irrelevant content. This design enables the deeper layers to concentrate computational resources on pertinent information, improving both accuracy and efficiency. Extensive experiments demonstrate that URaG achieves state-of-the-art performance while reducing computational overhead by 44-56%. The code is available at https://github.com/shi-yx/URaG.
Related papers
- FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning [2.9863062339019817]
We present textbfFABLE, a textbfForest-based textbfAdaptive textbfBi-path textbfLLM-textbfEnhanced retrieval framework.
arXiv Detail & Related papers (2026-01-26T04:00:56Z) - A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models [85.30893355216486]
We study how visual token redundancy evolves with different dMLLM architectures and tasks.<n>Our study reveals that visual redundancy emerges only in from-scratch dMLLMs while handling long-answer tasks.<n>Layer-skipping is promising for accelerating AR-to-diffusion dMLLMs, whereas progressive or late-step pruning is more effective for from-scratch dMLLMs.
arXiv Detail & Related papers (2025-11-19T04:13:36Z) - Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z) - REFRAG: Rethinking RAG based Decoding [67.4862300145604]
REFRAG is an efficient decoding framework that compresses, senses, and expands to improve latency in RAG applications.<n>We provide rigorous validation of REFRAG across diverse long-context tasks, including RAG, multi-turn conversations, and long document summarization.
arXiv Detail & Related papers (2025-09-01T03:31:44Z) - Parametric Retrieval Augmented Generation [32.29608109539912]
Parametric RAG is a new RAG paradigm that integrates external knowledge directly into the parameters of feed-forward networks.<n>It substantially enhances both the effectiveness and efficiency of knowledge augmentation in large language models.
arXiv Detail & Related papers (2025-01-27T10:04:49Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - PECAN: LLM-Guided Dynamic Progress Control with Attention-Guided Hierarchical Weighted Graph for Long-Document QA [16.945257645760428]
Long-document QA presents challenges with large-scale text and long-distance dependencies.<n>Recent advances in Large Language Models (LLMs) enable entire documents to be processed in a single pass.<n>We propose a novel retrieval method that constructs a hierarchical graph where edges are derived by LLM attention weights.
arXiv Detail & Related papers (2024-10-07T07:02:09Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - Multi-Head RAG: Solving Multi-Aspect Problems with LLMs [18.48202014877111]
Multi-Head RAG (MRAG) is a novel scheme for fetching multi-aspect documents.<n>We show MRAG's design advantages over 18 RAG baselines, empirical improvements of up to 20% in retrieval success ratios.
arXiv Detail & Related papers (2024-06-07T16:59:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.