Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer
- URL: http://arxiv.org/abs/2602.23978v1
- Date: Fri, 27 Feb 2026 12:39:38 GMT
- Title: Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer
- Authors: Huimu Wang, Xingzhi Yao, Yiming Qiu, Qinghong Zhang, Haotian Wang, Yufan Cui, Songlin Wang, Sulong Xu, Mingming Li,
- Abstract summary: We propose the Anchored Curriculum with Sequential Adaptive Quantization (SA2CRQ) framework.<n>The framework allocates code lengths based on item path entropy, assigning longer, discriminative IDs to head items and shorter, generalizable IDs to tail items.<n>We show that SA2CRQ yields consistent improvements over existing baselines, particularly in cold-start retrieval scenarios.
- Score: 11.95059276298165
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While semantic ID-based generative retrieval enables efficient end-to-end modeling in industrial applications, these methods face a persistent trade-off: head items are susceptible to ID collisions that negatively impact downstream tasks, whereas data-sparse tail items, including cold-start items, exhibit limited generalization. To address this issue, we propose the Anchored Curriculum with Sequential Adaptive Quantization (SA^2CRQ) framework. The framework introduces Sequential Adaptive Residual Quantization (SARQ) to dynamically allocate code lengths based on item path entropy, assigning longer, discriminative IDs to head items and shorter, generalizable IDs to tail items. To mitigate data sparsity, the Anchored Curriculum Residual Quantization (ACRQ) component utilizes a frozen semantic manifold learned from head items to regularize and accelerate the representation learning of tail items. Experimental results from a large-scale industrial search system and multiple public datasets indicate that SA^2CRQ yields consistent improvements over existing baselines, particularly in cold-start retrieval scenarios.
Related papers
- GLASS: A Generative Recommender for Long-sequence Modeling via SID-Tier and Semantic Search [51.44490997013772]
GLASS is a novel framework that integrates long-term user interests into the generative process via SID-Tier and Semantic Search.<n>We show that GLASS outperforms state-of-the-art baselines in experiments on two large-scale real-world datasets.
arXiv Detail & Related papers (2026-02-05T13:48:33Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation [51.62815306481903]
We propose textbfname, a novel framework that harmonizes the SID and HID. Specifically, we devise a dual-branch modeling architecture that enables the model to capture both the multi-granular semantics within SID while preserving the unique collaborative identity of HID.<n>Experiments on three real-world datasets show that name balances recommendation quality for both head and tail items while surpassing the existing baselines.
arXiv Detail & Related papers (2025-12-11T07:50:53Z) - Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation has emerged as a powerful approach to mitigate large language model hallucinations.<n>Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving.<n>We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z) - Order-agnostic Identifier for Large Language Model-based Generative Recommendation [94.37662915542603]
Items are assigned identifiers for Large Language Models (LLMs) to encode user history and generate the next item.<n>Existing approaches leverage either token-sequence identifiers, representing items as discrete token sequences, or single-token identifiers, using ID or semantic embeddings.<n>We propose SETRec, which leverages semantic tokenizers to obtain order-agnostic multi-dimensional tokens.
arXiv Detail & Related papers (2025-02-15T15:25:38Z) - Re-ranking the Context for Multimodal Retrieval Augmented Generation [28.63893944806149]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge to generate a response within a context.<n>RAG systems face unique challenges: (i) the retrieval process may select irrelevant entries to user query (e.g., images, documents), and (ii) vision-language models or multi-modal language models like GPT-4o may hallucinate when processing these entries to generate RAG output.<n>We show that by using a more advanced relevancy measure, one can enhance the retrieval process by selecting more relevant pieces from the knowledge-base and eliminate
arXiv Detail & Related papers (2025-01-08T18:58:22Z) - DeepNote: Note-Centric Deep Retrieval-Augmented Generation [72.70046559930555]
Retrieval-Augmented Generation (RAG) mitigates factual errors and hallucinations in Large Language Models (LLMs) for question-answering (QA)<n>We develop DeepNote, an adaptive RAG framework that achieves in-depth and robust exploration of knowledge sources through note-centric adaptive retrieval.
arXiv Detail & Related papers (2024-10-11T14:03:29Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - Breaking the Hourglass Phenomenon of Residual Quantization: Enhancing the Upper Bound of Generative Retrieval [16.953923822238455]
Generative retrieval (GR) has emerged as a transformative paradigm in search and recommender systems.
"Hourglass" phenomenon substantially impacts the performance of RQ-SID in generative retrieval.
We propose effective solutions to mitigate this issue, thereby enhancing the effectiveness of generative retrieval in real-world E-commerce applications.
arXiv Detail & Related papers (2024-07-31T09:52:53Z) - ASI++: Towards Distributionally Balanced End-to-End Generative Retrieval [29.65717446547002]
ASI++ is a novel fully end-to-end generative retrieval method.
It aims to simultaneously learn balanced ID assignments and improve retrieval performance.
arXiv Detail & Related papers (2024-05-23T07:54:57Z) - Corrective Retrieval Augmented Generation [36.04062963574603]
Retrieval-augmented generation (RAG) relies heavily on relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.
We propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation.
CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches.
arXiv Detail & Related papers (2024-01-29T04:36:39Z) - Augmented Embeddings for Custom Retrievals [13.773007276544913]
We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval.
Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding.
arXiv Detail & Related papers (2023-10-09T03:29:35Z) - Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations [24.952222114424146]
We propose using content-derived features as a replacement for random ids.
We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability.
Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models.
arXiv Detail & Related papers (2023-06-13T20:34:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.