Related papers: Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

URL: http://arxiv.org/abs/2512.00968v1
Date: Sun, 30 Nov 2025 16:31:16 GMT
Title: Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search
Authors: Ziyang Zeng, Heming Jing, Jindong Chen, Xiangli Li, Hongyu Liu, Yixuan He, Zhengyu Li, Yige Sun, Zheyong Xie, Yuqing Yang, Shaosheng Cao, Jun Fan, Yi Wu, Yao Hu,
Abstract summary: We investigate whether explicit reasoning can enhance both interpretability and performance in relevance modeling.<n>In this work, we formulate relevance modeling in Xiaohongshu search as a reasoning task.<n>We introduce a Reinforcement Learning (RL)-based training framework to enhance the grounded reasoning capabilities of GRMs.
Score: 32.56725829132154
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ranking relevance is a fundamental task in search engines, aiming to identify the items most relevant to a given user query. Traditional relevance models typically produce scalar scores or directly predict relevance labels, limiting both interpretability and the modeling of complex relevance signals. Inspired by recent advances in Chain-of-Thought (CoT) reasoning for complex tasks, we investigate whether explicit reasoning can enhance both interpretability and performance in relevance modeling. However, existing reasoning-based Generative Relevance Models (GRMs) primarily rely on supervised fine-tuning on large amounts of human-annotated or synthetic CoT data, which often leads to limited generalization. Moreover, domain-agnostic, free-form reasoning tends to be overly generic and insufficiently grounded, limiting its potential to handle the diverse and ambiguous cases prevalent in open-domain search. In this work, we formulate relevance modeling in Xiaohongshu search as a reasoning task and introduce a Reinforcement Learning (RL)-based training framework to enhance the grounded reasoning capabilities of GRMs. Specifically, we incorporate practical business-specific relevance criteria into the multi-step reasoning prompt design and propose Stepwise Advantage Masking (SAM), a lightweight process-supervision strategy which facilitates effective learning of these criteria through improved credit assignment. To enable industrial deployment, we further distill the large-scale RL-tuned model to a lightweight version suitable for real-world search systems. Extensive experiments on industrial datasets, along with online A/B tests, demonstrate the effectiveness of our approach.

Related papers

A Comprehensive Evaluation of LLM Reasoning: From Single-Model to Multi-Agent Paradigms [20.241519889633285]
Large Language Models (LLMs) are increasingly deployed as reasoning systems, where reasoning paradigms play a critical role.<n>We conduct a comprehensive and unified evaluation of reasoning paradigms, spanning direct single-model generation, CoT-augmented single-model reasoning, and representative MAS.<n>We introduce MIMeBench, a new open-ended benchmark that targets two foundational yet underexplored semantic capabilities.
arXiv Detail & Related papers (2026-01-19T17:23:45Z)
Multi-hop Reasoning via Early Knowledge Alignment [68.28168992785896]
Early Knowledge Alignment (EKA) aims to align Large Language Models with contextually relevant retrieved knowledge.<n>EKA significantly improves retrieval precision, reduces cascading errors, and enhances both performance and efficiency.<n>EKA proves effective as a versatile, training-free inference strategy that scales seamlessly to large models.
arXiv Detail & Related papers (2025-12-23T08:14:44Z)
ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management [9.138155308817215]
"Pretrain-then-Reinforce" approach reconciles AI's adaptive perception with Operations Research's structural rigor.<n>We show that a lightweight, domain-informed model can deliver state-of-the-art performance and robust transferability when guided by structured OR logic.
arXiv Detail & Related papers (2025-12-22T03:39:43Z)
VAR: Visual Attention Reasoning via Structured Search and Backtracking [49.427842994857635]
We introduce Visual Attention Reasoning, a framework that recasts grounded reasoning as a structured search.<n> VAR decomposes the reasoning process into two key stages: traceable evidence grounding and search-based chain-of-thought.<n>We show that our 7B model, VAR-7B, sets a new state-of-the-art on a comprehensive suite of hallucination and safety benchmarks.
arXiv Detail & Related papers (2025-10-21T13:18:44Z)
Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce [61.03081096959132]
We propose a context-aware reasoning-enhanced generative search framework for better textbfunderstanding the complicated context.<n>Our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
arXiv Detail & Related papers (2025-10-19T16:46:11Z)
Demystifying Reinforcement Learning in Agentic Reasoning [90.3737088727791]
We conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning.<n>We highlight our key insights: (i) replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT.<n> Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency.
arXiv Detail & Related papers (2025-10-13T17:57:15Z)
The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging [8.930191971732649]
We present a large-scale empirical study evaluating a range of model merging techniques across multiple reasoning benchmarks.<n>Our findings reveal that model merging offers an effective and controllable method for calibrating the trade-off between reasoning accuracy and token efficiency.<n>Our study provides the first comprehensive analysis of this tunable space, offering practical guidelines for creating LLMs with specific reasoning profiles.
arXiv Detail & Related papers (2025-09-26T08:12:13Z)
DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning [5.280613615397194]
DynaSearcher is an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL)<n>We employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality.<n> Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets.
arXiv Detail & Related papers (2025-07-23T09:58:31Z)
Reasoning Meets Personalization: Unleashing the Potential of Large Reasoning Model for Personalized Generation [21.89080753903469]
We present the first systematic evaluation of large reasoning models (LRMs) for personalization tasks.<n>Our analysis identifies three key limitations: divergent thinking, misalignment of response formats, and ineffective use of retrieved information.<n>We propose Reinforced Reasoning for Personalization (model), a novel framework that incorporates a hierarchical reasoning thought template to guide LRMs in generating structured outputs.
arXiv Detail & Related papers (2025-05-23T07:30:13Z)
LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z)
Boosting LLM-based Relevance Modeling with Distribution-Aware Robust Learning [14.224921308101624]
We propose a novel Distribution-Aware Robust Learning framework (DaRL) for relevance modeling.<n>DaRL has been deployed online to serve the Alipay's insurance product search.
arXiv Detail & Related papers (2024-12-17T03:10:47Z)
INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL) We integrate a term inspired by variational empowerment into a state-space model based on mutual information. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.