MERGE: Next-Generation Item Indexing Paradigm for Large-Scale Streaming Recommendation
- URL: http://arxiv.org/abs/2601.20199v1
- Date: Wed, 28 Jan 2026 02:56:30 GMT
- Title: MERGE: Next-Generation Item Indexing Paradigm for Large-Scale Streaming Recommendation
- Authors: Jing Yan, Yimeng Bai, Zongyu Liu, Yahui Liu, Junwei Wang, Jingze Huang, Haoda Li, Sihao Ding, Shaohui Ruan, Yang Zhang,
- Abstract summary: We propose MERGE, a next-generation item indexing paradigm that adaptively constructs clusters from scratch, dynamically monitors cluster occupancy, and forms hierarchical index structures via fine-to-coarse merging.<n>Extensive experiments demonstrate that MERGE significantly improves assignment accuracy, cluster uniformity, and cluster separation compared with existing indexing methods.<n>Online A/B tests show substantial gains in key business metrics, highlighting its potential as a foundational indexing approach for large-scale recommendation.
- Score: 15.1614576262293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Item indexing, which maps a large corpus of items into compact discrete representations, is critical for both discriminative and generative recommender systems, yet existing Vector Quantization (VQ)-based approaches struggle with the highly skewed and non-stationary item distributions common in streaming industry recommenders, leading to poor assignment accuracy, imbalanced cluster occupancy, and insufficient cluster separation. To address these challenges, we propose MERGE, a next-generation item indexing paradigm that adaptively constructs clusters from scratch, dynamically monitors cluster occupancy, and forms hierarchical index structures via fine-to-coarse merging. Extensive experiments demonstrate that MERGE significantly improves assignment accuracy, cluster uniformity, and cluster separation compared with existing indexing methods, while online A/B tests show substantial gains in key business metrics, highlighting its potential as a foundational indexing approach for large-scale recommendation.
Related papers
- You Can Trust Your Clustering Model: A Parameter-free Self-Boosting Plug-in for Deep Clustering [73.48306836608124]
DCBoost is a parameter-free plug-in designed to enhance the global feature structures of current deep clustering models.<n>By harnessing reliable local structural cues, our method aims to elevate clustering performance effectively.
arXiv Detail & Related papers (2025-11-26T09:16:36Z) - Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version) [50.41628860536753]
We propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax.<n>Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process.
arXiv Detail & Related papers (2025-11-12T11:17:17Z) - Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations [22.48125906976824]
We introduce the Cascaded Organized Bi-Represented generAtive retrieval framework, which integrates sparse semantic IDs and dense vectors through a cascading process.<n>Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors.<n>During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model.
arXiv Detail & Related papers (2025-03-04T10:00:05Z) - Towards Scalable Semantic Representation for Recommendation [65.06144407288127]
Mixture-of-Codes is proposed to construct semantic IDs based on large language models (LLMs)
Our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations.
arXiv Detail & Related papers (2024-10-12T15:10:56Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - GCC: Generative Calibration Clustering [55.44944397168619]
We propose a novel Generative Clustering (GCC) method to incorporate feature learning and augmentation into clustering procedure.
First, we develop a discrimirative feature alignment mechanism to discover intrinsic relationship across real and generated samples.
Second, we design a self-supervised metric learning to generate more reliable cluster assignment.
arXiv Detail & Related papers (2024-04-14T01:51:11Z) - Unfolding ADMM for Enhanced Subspace Clustering of Hyperspectral Images [43.152314090830174]
We introduce an innovative clustering architecture for hyperspectral images (HSI) by unfolding an iterative solver based on the Alternating Direction Method of Multipliers (ADMM) for sparse subspace clustering.
Our approach captures well the structural characteristics of HSI data by employing the K nearest neighbors algorithm as part of a structure preservation module.
arXiv Detail & Related papers (2024-04-10T15:51:46Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - End-to-end Learnable Clustering for Intent Learning in Recommendation [54.157784572994316]
We propose a novel intent learning method termed underlineELCRec.
It unifies behavior representation learning into an underlineEnd-to-end underlineLearnable underlineClustering framework.
We deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.
arXiv Detail & Related papers (2024-01-11T15:22:55Z) - Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers [24.88026399458157]
Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur.
The key to secure machines in distributed learning is resilient aggregation mechanisms.
arXiv Detail & Related papers (2023-12-20T08:36:55Z) - Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data.
We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.