Representation Quantization for Collaborative Filtering Augmentation
- URL: http://arxiv.org/abs/2508.11194v1
- Date: Fri, 15 Aug 2025 04:00:50 GMT
- Title: Representation Quantization for Collaborative Filtering Augmentation
- Authors: Yunze Luo, Yinjie Jiang, Gaode Chen, Jingchi Wang, Shicheng Wang, Ruina Sun, Jiang Yuezihan, Jun Zhang, Jian Liang, Han Li, Kun Gai, Kaigui Bian,
- Abstract summary: We propose a novel two-stage collaborative recommendation algorithm, DQRec.<n>It augments features and homogeneous linkages by extracting behavior characteristics jointly from interaction sequences and attributes.<n>By integrating these semantic ID patterns into the recommendation process through feature and linkage augmentation, the system enriches both latent and explicit user and item features.
- Score: 49.14087936092634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the core algorithm in recommendation systems, collaborative filtering (CF) algorithms inevitably face the problem of data sparsity. Since CF captures similar users and items for recommendations, it is effective to augment the lacking user-user and item-item homogeneous linkages. However, existing methods are typically limited to connecting through overlapping interacted neighbors or through similar attributes and contents. These approaches are constrained by coarse-grained, sparse attributes and fail to effectively extract behavioral characteristics jointly from interaction sequences and attributes. To address these challenges, we propose a novel two-stage collaborative recommendation algorithm, DQRec: Decomposition-based Quantized Variational AutoEncoder (DQ-VAE) for Recommendation. DQRec augments features and homogeneous linkages by extracting the behavior characteristics jointly from interaction sequences and attributes, namely patterns, such as user multi-aspect interests. Inspired by vector quantization (VQ) technology, we propose a new VQ algorithm, DQ-VAE, which decomposes the pre-trained representation embeddings into distinct dimensions, and quantize them to generates semantic IDs. We utilize the generated semantic IDs as the extracted patterns mentioned above. By integrating these semantic ID patterns into the recommendation process through feature and linkage augmentation, the system enriches both latent and explicit user and item features, identifies pattern-similar neighbors, and thereby improves the efficiency of information diffusion. Experimental comparisons with baselines across multiple datasets demonstrate the superior performance of the proposed DQRec method.
Related papers
- GLASS: A Generative Recommender for Long-sequence Modeling via SID-Tier and Semantic Search [51.44490997013772]
GLASS is a novel framework that integrates long-term user interests into the generative process via SID-Tier and Semantic Search.<n>We show that GLASS outperforms state-of-the-art baselines in experiments on two large-scale real-world datasets.
arXiv Detail & Related papers (2026-02-05T13:48:33Z) - Closing the Performance Gap in Generative Recommenders with Collaborative Tokenization and Efficient Modeling [10.757287948514604]
We introduce a contrastive tokenization method that integrates collaborative information directly into the learned item representations.<n>We also propose MARIUS, a lightweight, audio-inspired generative model that decouples timeline modeling from item decoding.
arXiv Detail & Related papers (2025-08-12T17:06:55Z) - Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations [22.48125906976824]
We introduce the Cascaded Organized Bi-Represented generAtive retrieval framework, which integrates sparse semantic IDs and dense vectors through a cascading process.<n>Our method alternates between generating these representations by first generating sparse IDs, which serve as conditions to aid in the generation of dense vectors.<n>During inference, COBRA employs a coarse-to-fine strategy, starting with sparse ID generation and refining them into dense vectors via the generative model.
arXiv Detail & Related papers (2025-03-04T10:00:05Z) - Order-agnostic Identifier for Large Language Model-based Generative Recommendation [94.37662915542603]
Items are assigned identifiers for Large Language Models (LLMs) to encode user history and generate the next item.<n>Existing approaches leverage either token-sequence identifiers, representing items as discrete token sequences, or single-token identifiers, using ID or semantic embeddings.<n>We propose SETRec, which leverages semantic tokenizers to obtain order-agnostic multi-dimensional tokens.
arXiv Detail & Related papers (2025-02-15T15:25:38Z) - Unifying Generative and Dense Retrieval for Sequential Recommendation [37.402860622707244]
We propose LIGER, a hybrid model that combines the strengths of sequential dense retrieval and generative retrieval.<n> LIGER integrates sequential dense retrieval into generative retrieval, mitigating performance differences and enhancing cold-start item recommendation.<n>This hybrid approach provides insights into the trade-offs between these approaches and demonstrates improvements in efficiency and effectiveness for recommendation systems in small-scale benchmarks.
arXiv Detail & Related papers (2024-11-27T23:36:59Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - Retrieval with Learned Similarities [2.729516456192901]
State-of-the-art retrieval algorithms have migrated to learned similarities.<n>We show that Mixture-of-Logits (MoL) can be realized empirically to achieve superior performance on diverse retrieval scenarios.
arXiv Detail & Related papers (2024-07-22T08:19:34Z) - CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - A Hybrid Approach to Enhance Pure Collaborative Filtering based on
Content Feature Relationship [0.17188280334580192]
We introduce a novel method to extract the implicit relationship between content features using a sort of well-known methods from the natural language processing domain, namely Word2Vec.
Next, we propose a novel content-based recommendation system that employs the relationship to determine vector representations for items.
Our evaluation results demonstrate that it can predict the preference a user would have for a set of items as good as pure collaborative filtering.
arXiv Detail & Related papers (2020-05-17T02:20:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.