KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models
- URL: http://arxiv.org/abs/2506.19466v2
- Date: Fri, 27 Jun 2025 08:11:14 GMT
- Title: KunLunBaizeRAG: Reinforcement Learning Driven Inference Performance Leap for Large Language Models
- Authors: Cheng Li, Jiexiong Liu, Yixuan Chen, Qihang Zhou, KunLun Meta,
- Abstract summary: KunLunBaizeRAG is a reinforcement learning-driven reasoning framework designed to enhance the reasoning capabilities of large language models (LLMs) in complex multi-hop question-answering tasks.<n>Key innovations include the RAG-driven Reasoning Alignment (RDRA) mechanism, the Search-Think Iterative Enhancement (STIE) mechanism, the Network-Local Intelligent Routing (NLR) mechanism, and a progressive hybrid training strategy.
- Score: 4.637288682081713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces KunLunBaizeRAG, a reinforcement learning-driven reasoning framework designed to enhance the reasoning capabilities of large language models (LLMs) in complex multi-hop question-answering tasks. The framework addresses key limitations of traditional RAG, such as retrieval drift, information redundancy, and strategy rigidity. Key innovations include the RAG-driven Reasoning Alignment (RDRA) mechanism, the Search-Think Iterative Enhancement (STIE) mechanism, the Network-Local Intelligent Routing (NLR) mechanism, and a progressive hybrid training strategy. Experimental results demonstrate significant improvements in exact match (EM) and LLM-judged score (LJ) across four benchmarks, highlighting the framework's robustness and effectiveness in complex reasoning scenarios.
Related papers
- MeRF: Motivation-enhanced Reinforcement Finetuning for Large Reasoning Models [95.6332110724999]
Motivation-enhanced Reinforcement Finetuning (MeRF) is an intuitive yet effective method enhancing reinforcement learning of Large Language Models (LLMs)<n>MeRF directly injects the reward specification into the prompt, which serves as an in-context motivation for model to improve its responses with awareness of the optimization objective.<n> Empirical evaluations on the Knights and Knaves(K&K) logic puzzle reasoning benchmark demonstrate that textttMeRF achieves substantial performance gains over baselines.
arXiv Detail & Related papers (2025-06-23T10:37:57Z) - Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger [51.01841635655944]
Recent advancements in Large Vision Language Models (LVLMs) have significantly improved performance in Visual Question Answering (VQA) tasks.<n>Existing methods still face challenges, such as the scarcity of knowledge with reasoning examples and erratic responses from retrieved knowledge.<n>We propose a multimodal RAG framework, termed RCTS, which enhances LVLMs by constructing a Reasoning Context-enriched knowledge base and a Tree Search re-ranking method.
arXiv Detail & Related papers (2025-06-09T14:00:57Z) - SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning [19.457621121430464]
Training large reasoning models (LRMs) with reinforcement learning in STEM domains is hindered by the scarcity of high-quality, diverse, and verifiable problem sets.<n>We introduce SHARP, a unified approach to Synthesizing High-quality Aligned Reasoning Problems for LRMs reinforcement learning with verifiable rewards (RLVR)<n>We implement SHARP by leveraging a state-of-the-art LRM to infer and verify challenging STEM questions, then employ a reinforcement learning loop to refine the model's reasoning through verifiable reward signals.
arXiv Detail & Related papers (2025-05-20T09:54:42Z) - RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward [7.9399136525335585]
RAIDEN-R1 is a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR)<n>We construct a high-quality, role-aware Chain-of-Thought dataset through multi-LLM collaboration.<n> Experiments on the RAIDEN benchmark demonstrate RAIDEN-R1's superiority.
arXiv Detail & Related papers (2025-05-15T12:22:10Z) - Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations [65.11348389219887]
We introduce Dialectic-RAG (DRAG), a modular approach that evaluates retrieved information by comparing, contrasting, and resolving conflicting perspectives.<n>We show the impact of our framework both as an in-context learning strategy and for constructing demonstrations to instruct smaller models.
arXiv Detail & Related papers (2025-04-07T06:55:15Z) - RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration [4.604003661048267]
RAG-KG-IL is a novel multi-agent hybrid framework designed to enhance the reasoning capabilities of Large Language Models.<n>It integrates Retrieval-Augmented Generation (RAG) and Knowledge Graphs (KGs) with an Incremental Learning (IL) approach.<n>We evaluate the framework using real-world case studies involving health-related queries.
arXiv Detail & Related papers (2025-03-14T11:50:16Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - AirRAG: Activating Intrinsic Reasoning for Retrieval Augmented Generation using Tree-based Search [4.4907551923591695]
We propose a novel thinking pattern in RAG that integrates system analysis with efficient reasoning actions.<n>Specifically, our approach designs five fundamental reasoning actions, which are expanded to a broad tree-based reasoning space.<n> Experimental results demonstrate the effectiveness of AirRAG, showing significant performance gains on complex question-answering datasets.
arXiv Detail & Related papers (2025-01-17T09:16:13Z) - Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)<n>We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.<n>We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z) - RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner [2.5903660653548366]
Self-taught reasoner (STaR) uses reinforcement learning to automatically generate reasoning steps.<n> STaR and its variants have demonstrated empirical success, but a theoretical foundation explaining these improvements is lacking.<n>This work provides a theoretical framework for understanding the effectiveness of reinforcement learning on CoT reasoning and STaR.
arXiv Detail & Related papers (2024-10-31T13:17:53Z) - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.