CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search
- URL: http://arxiv.org/abs/2511.15443v1
- Date: Wed, 19 Nov 2025 13:57:40 GMT
- Title: CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search
- Authors: Ao Xie, Jiahui Chen, Quanzhi Zhu, Xiaoze Jiang, Zhiheng Qin, Enyun Yu, Han Li,
- Abstract summary: CroPS (Cross-Perspective Positive Samples) is a novel retrieval data engine.<n>It enhances training with positive signals derived from user query reformulation behavior.<n>CroPS is now fully deployed in Kuaishou Search, serving hundreds of millions of users daily.
- Score: 10.310885252492925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense retrieval has become a foundational paradigm in modern search systems, especially on short-video platforms. However, most industrial systems adopt a self-reinforcing training pipeline that relies on historically exposed user interactions for supervision. This paradigm inevitably leads to a filter bubble effect, where potentially relevant but previously unseen content is excluded from the training signal, biasing the model toward narrow and conservative retrieval. In this paper, we present CroPS (Cross-Perspective Positive Samples), a novel retrieval data engine designed to alleviate this problem by introducing diverse and semantically meaningful positive examples from multiple perspectives. CroPS enhances training with positive signals derived from user query reformulation behavior (query-level), engagement data in recommendation streams (system-level), and world knowledge synthesized by large language models (knowledge-level). To effectively utilize these heterogeneous signals, we introduce a Hierarchical Label Assignment (HLA) strategy and a corresponding H-InfoNCE loss that together enable fine-grained, relevance-aware optimization. Extensive experiments conducted on Kuaishou Search, a large-scale commercial short-video search platform, demonstrate that CroPS significantly outperforms strong baselines both offline and in live A/B tests, achieving superior retrieval performance and reducing query reformulation rates. CroPS is now fully deployed in Kuaishou Search, serving hundreds of millions of users daily.
Related papers
- Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation [7.070021001906444]
News recommendation plays a critical role in online news platforms by helping users discover relevant content.<n>Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals.<n>We present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries.
arXiv Detail & Related papers (2026-02-16T18:45:40Z) - TaoSearchEmb: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search [11.893855231479717]
Retrieval-GRPO is a reinforcement learning-based dense retrieval framework.<n>It has been deployed on China's largest e-commerce platform.
arXiv Detail & Related papers (2025-11-17T20:16:52Z) - Towards High Data Efficiency in Reinforcement Learning with Verifiable Reward [54.708851958671794]
We propose a Data-Efficient Policy Optimization pipeline that combines optimized strategies for both offline and online data selection.<n>In offline phase, we curate a high-quality subset of training samples based on diversity, influence, and appropriate difficulty.<n>During online RLVR training, we introduce a sample-level explorability metric to dynamically filter samples with low exploration potential.
arXiv Detail & Related papers (2025-09-01T10:04:20Z) - Buffer-free Class-Incremental Learning with Out-of-Distribution Detection [17.67144692440415]
Class-incremental learning (CIL) poses significant challenges in open-world scenarios.<n>We present an in-depth analysis of post-hoc OOD detection methods and investigate their potential to eliminate the need for a memory buffer.<n>We show that this buffer-free approach achieves comparable or superior performance to buffer-based methods both in terms of class-incremental learning and the rejection of unknown samples.
arXiv Detail & Related papers (2025-05-29T13:01:00Z) - Unveiling Contrastive Learning's Capability of Neighborhood Aggregation for Collaborative Filtering [16.02820746003461]
graph contrastive learning (GCL) has gradually become a dominant approach in recommender systems.<n>In this paper, we reveal via theoretical derivation that the gradient descent process of the CL objective is formally equivalent to graph convolution.<n>We propose a novel neighborhood aggregation objective to bring users closer to all interacted items while pushing them away from other positive pairs.
arXiv Detail & Related papers (2025-04-14T11:22:41Z) - Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
Generative retrieval reformulates retrieval as an autoregressive generation task, where large language models generate target documents directly from a query.<n>We systematically investigate training and inference scaling laws in generative retrieval, exploring how model size, training data scale, and inference-time compute jointly influence performance.
arXiv Detail & Related papers (2025-03-24T17:59:03Z) - Chain-of-Retrieval Augmented Generation [91.02950964802454]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - Enhancing Retrieval Performance: An Ensemble Approach For Hard Negative Mining [0.0]
This study focuses on explaining the crucial role of hard negatives in the training process of cross-encoder models.
We have developed a robust hard negative mining technique for efficient training of cross-encoder re-rank models on an enterprise dataset.
arXiv Detail & Related papers (2024-10-18T05:23:39Z) - Retrieval-Oriented Knowledge for Click-Through Rate Prediction [29.55757862617378]
Click-through rate (CTR) prediction is crucial for personalized online services.
underlineretrieval-underlineoriented underlineknowledge (bfname) framework bypasses the real retrieval process.
name features a knowledge base that preserves and imitates the retrieved & aggregated representations.
arXiv Detail & Related papers (2024-04-28T20:21:03Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.