Related papers: Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval

Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval

URL: http://arxiv.org/abs/2510.02745v2
Date: Sat, 25 Oct 2025 11:34:54 GMT
Title: Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval
Authors: Lanyun Zhu, Deyi Ji, Tianrun Chen, Haiyang Wu, Shiqi Wang,
Abstract summary: This paper introduces Retrv-R1, the first R1-style MLLM specifically designed for multimodal universal retrieval.<n>We find that directly applying the methods of DeepSeek-R1 to retrieval tasks is not feasible.<n>To address these issues, Retrv-R1 introduces an information compression module with a details inspection mechanism.
Score: 28.986143611245836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success of DeepSeek-R1 demonstrates the immense potential of using reinforcement learning (RL) to enhance LLMs' reasoning capabilities. This paper introduces Retrv-R1, the first R1-style MLLM specifically designed for multimodal universal retrieval, achieving higher performance by employing step-by-step reasoning to produce more accurate retrieval results. We find that directly applying the methods of DeepSeek-R1 to retrieval tasks is not feasible, mainly due to (1) the high computational cost caused by the large token consumption required for multiple candidates with reasoning processes, and (2) the instability and suboptimal results when directly applying RL to train for retrieval tasks. To address these issues, Retrv-R1 introduces an information compression module with a details inspection mechanism, which enhances computational efficiency by reducing the number of tokens while ensuring that critical information for challenging candidates is preserved. Furthermore, a new training paradigm is proposed, including an activation stage using a retrieval-tailored synthetic CoT dataset for more effective optimization, followed by RL with a novel curriculum reward to improve both performance and efficiency. Incorporating these novel designs, Retrv-R1 achieves SOTA performance, high efficiency, and strong generalization ability, as demonstrated by experiments across multiple benchmarks and tasks.

Related papers

Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS [62.22644307952087]
We introduce AIRL-S, the first natural unification of RL-based and search-based TTS.<n>We leverage adversarial inverse reinforcement learning (AIRL) combined with group relative policy optimization (GRPO) to learn a dense, dynamic PRM directly from correct reasoning traces.<n>Our results show that our unified approach improves performance by 9 % on average over the base model, matching GPT-4o.
arXiv Detail & Related papers (2025-08-19T23:41:15Z)
Sample-efficient LLM Optimization with Reset Replay [13.739451157239756]
We introduce Reset Replay (LoRR), a plugin designed to enhance sample efficiency in any preference-based optimization framework.<n>LoRR incorporates a periodic reset strategy with reusing initial data, which preserves network plasticity.<n>Our experiments demonstrate that LoRR significantly boosts the performance of various preference optimization methods on both mathematical and general reasoning benchmarks.
arXiv Detail & Related papers (2025-08-08T15:56:49Z)
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle [53.239242017802056]
Reinforcement learning (RL) has emerged as an effective post-training paradigm for enhancing the reasoning capabilities of multimodal large language model (MLLM)<n>However, current RL pipelines often suffer from training inefficiencies caused by two underexplored issues: Advantage Collapsing and Rollout Silencing.<n>We propose Shuffle-R1, a simple yet principled framework that improves RL fine-tuning efficiency by dynamically restructuring trajectory sampling and batch composition.
arXiv Detail & Related papers (2025-08-07T17:53:47Z)
Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning [45.10424242207931]
Retrieval-augmented generation (RAG) enhances the text generation capabilities of large language models (LLMs)<n>We introduce a novel method ReasonRAG that automatically constructs RAG-ProGuide, a high-quality dataset providing process-level rewards for query generation, evidence extraction, and answer generation.<n>With the process-level policy optimization, the proposed framework empowers LLMs to autonomously invoke search, generate queries, extract relevant evidence, and produce final answers.
arXiv Detail & Related papers (2025-05-20T08:21:00Z)
Effective Inference-Free Retrieval for Learned Sparse Representations [19.54810957623511]
Learned Sparse Retrieval (LSR) is an effective IR approach that exploits pre-trained language models for encoding text into a learned bag of words.<n>Recently, new efficient -- inverted index-based -- retrieval engines have been proposed, leading to a natural question: has the role of regularization changed in training LSR models?<n>We show that regularization can be relaxed to produce more effective LSR encoders.
arXiv Detail & Related papers (2025-04-30T09:10:46Z)
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval [49.669503570350166]
Generative information retrieval (GenIR) is a promising neural retrieval paradigm that formulates document retrieval as a document identifier (docid) generation task.<n>Existing GenIR models suffer from token-level misalignment, where models trained to predict the next token often fail to capture document-level relevance effectively.<n>We propose direct document relevance optimization (DDRO), which aligns token-level docid generation with document-level relevance estimation through direct optimization via pairwise ranking.
arXiv Detail & Related papers (2025-04-07T15:27:37Z)
Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning [13.642017219922238]
Rec-R1 bridges large language models (LLMs) with recommendation systems through closed-loop optimization.<n>Unlike prompting and supervised fine-tuning (SFT), Rec-R1 directly optimize LLM generation using feedback from a fixed black-box recommendation model.
arXiv Detail & Related papers (2025-03-31T16:36:00Z)
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models [24.45348222168512]
We propose the reasoning MLLM, Vision-R1, to improve multimodal reasoning capability.<n>Our model achieves an average improvement of $sim$6% across various multimodal math reasoning benchmarks.<n>Vision-R1-7B achieves a 73.5% accuracy on the widely used MathVista benchmark, which is only 0.4% lower than the leading reasoning model, OpenAI O1.
arXiv Detail & Related papers (2025-03-09T20:06:45Z)
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token. We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains. Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.