Learning to Hash for Recommendation: A Survey
- URL: http://arxiv.org/abs/2412.03875v2
- Date: Thu, 23 Oct 2025 06:08:41 GMT
- Title: Learning to Hash for Recommendation: A Survey
- Authors: Fangyuan Luo, Yankai Chen, Jun Wu, Tong Li, Philip S. Yu, Xue Liu,
- Abstract summary: This survey provides a comprehensive overview of state-of-the-art HashRec algorithms.<n>We categorize existing works into a three-tier taxonomy based on: (i) learning objectives, (ii) optimization strategies, and (iii) recommendation scenarios.
- Score: 49.943390288789494
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the explosive growth of users and items, Recommender Systems are facing unprecedented challenges in terms of retrieval efficiency and storage overhead. Learning to Hash techniques have emerged as a promising solution to these issues by encoding high-dimensional data into compact hash codes. As a result, hashing-based recommendation methods (HashRec) have garnered growing attention for enabling large-scale and efficient recommendation services. This survey provides a comprehensive overview of state-of-the-art HashRec algorithms. Specifically, we begin by introducing the common two-tower architecture used in the recall stage and by detailing two predominant hash search strategies. Then, we categorize existing works into a three-tier taxonomy based on: (i) learning objectives, (ii) optimization strategies, and (iii) recommendation scenarios. Additionally, we summarize widely adopted evaluation metrics for assessing both the effectiveness and efficiency of HashRec algorithms. Finally, we discuss current limitations in the field and outline promising directions for future research. We index these HashRec methods at the repository \href{https://github.com/Luo-Fangyuan/HashRec}{https://github.com/Luo-Fangyuan/HashRec}.
Related papers
- Collaborative Group-Aware Hashing for Fast Recommender Systems [66.92426381995695]
Hash technique has shown its superiority for speeding up the online recommendation by bit operations on Hamming distance computations.<n>Existing hashing-based recommendations suffer from low accuracy, especially with sparse settings.<n>This paper lodges a Collaborative Group-Aware Hashing (CGAH) method for both collaborative filtering and content-aware recommendations.
arXiv Detail & Related papers (2025-12-23T09:07:28Z) - Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models [4.531902882476647]
We introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pretrained encoders that produce rich pretrained embeddings.<n>Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning.
arXiv Detail & Related papers (2025-09-17T20:58:43Z) - HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine Retrieval and Augmented Generation [16.147618749631103]
Hash-RAG is a framework that integrates deep hashing techniques with systematic optimizations.<n>Building upon this hash-based efficient retrieval framework, we establish the foundation for fine-grained chunking.
arXiv Detail & Related papers (2025-05-22T02:22:11Z) - SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing [83.35231185111464]
Deep learning has shifted the retrieval paradigm from lexical-based matching to encode source code and queries into vector representations.
Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates.
We propose a novel approach, which converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy.
arXiv Detail & Related papers (2024-12-16T12:51:35Z) - Prototypical Hash Encoding for On-the-Fly Fine-Grained Category Discovery [65.16724941038052]
Category-aware Prototype Generation (CPG) and Discrimi Category 5.3% (DCE) are proposed.
CPG enables the model to fully capture the intra-category diversity by representing each category with multiple prototypes.
DCE boosts the discrimination ability of hash code with the guidance of the generated category prototypes.
arXiv Detail & Related papers (2024-10-24T23:51:40Z) - A Lower Bound of Hash Codes' Performance [122.88252443695492]
In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance.
We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization.
By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26.5%$ increase in mean Average Precision and an up to $20.5%$ increase in accuracy.
arXiv Detail & Related papers (2022-10-12T03:30:56Z) - HCFRec: Hash Collaborative Filtering via Normalized Flow with Structural
Consensus for Efficient Recommendation [23.73674947905047]
Hash-based collaborative filtering (Hash-CF) approaches employ efficient Hamming distance of learned binary representations of users and items to accelerate recommendations.
We propose HCFRec, a novel Hash-CF approach for effective and efficient recommendations.
arXiv Detail & Related papers (2022-05-24T12:51:52Z) - MOON: Multi-Hash Codes Joint Learning for Cross-Media Retrieval [30.77157852327981]
Cross-media hashing technique has attracted increasing attention for its high computation efficiency and low storage cost.
We develop a novel Multiple hash cOdes jOint learNing method (MOON) for cross-media retrieval.
arXiv Detail & Related papers (2021-08-17T14:47:47Z) - Unsupervised Multi-Index Semantic Hashing [23.169142004594434]
We propose an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing.
We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search.
We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33% slower than MISH.
arXiv Detail & Related papers (2021-03-26T13:33:48Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z) - Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code.
In this study, we propose a novel reinforcing short-length hashing (RSLH)
In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information.
Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z) - A Survey on Deep Hashing Methods [52.326472103233854]
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries.
With the development of deep learning, deep hashing methods show more advantages than traditional methods.
Deep supervised hashing is categorized into pairwise methods, ranking-based methods, pointwise methods and quantization.
Deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods.
arXiv Detail & Related papers (2020-03-04T08:25:15Z) - Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale.
Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous.
We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z) - A Novel Incremental Cross-Modal Hashing Approach [21.99741793652628]
We propose a novel incremental cross-modal hashing algorithm termed "iCMH"
The proposed approach consists of two sequential stages, namely, learning the hash codes and training the hash functions.
Experiments across a variety of cross-modal datasets and comparisons with state-of-the-art cross-modal algorithms shows the usefulness of our approach.
arXiv Detail & Related papers (2020-02-03T12:34:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.