Related papers: Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models

URL: http://arxiv.org/abs/2509.14427v1
Date: Wed, 17 Sep 2025 20:58:43 GMT
Title: Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models
Authors: Ilyass Moummad, Kawtar Zaher, Lukas Rauch, Alexis Joly,
Abstract summary: We introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pretrained encoders that produce rich pretrained embeddings.<n>Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning.
Score: 4.531902882476647
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Information retrieval with compact binary embeddings, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pretrained encoders that produce rich pretrained embeddings. We revisit classical, training-free hashing techniques: principal component analysis, random orthogonal projection, and threshold binarization, to produce a strong baseline for hashing. Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning. To demonstrate the generality and effectiveness of this approach, we evaluate it on standard image retrieval benchmarks as well as a newly introduced benchmark for audio hashing.

Related papers

A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation [69.50397417361351]
Text hashing projects original texts into compact binary hash codes.<n>Deep text hashing has demonstrated significant advantages over traditional, data-independent hashing techniques.<n>This survey investigates current deep text hashing methods by categorizing them based on their core components.
arXiv Detail & Related papers (2025-10-31T06:51:37Z)
HASH-RAG: Bridging Deep Hashing with Retriever for Efficient, Fine Retrieval and Augmented Generation [16.147618749631103]
Hash-RAG is a framework that integrates deep hashing techniques with systematic optimizations.<n>Building upon this hash-based efficient retrieval framework, we establish the foundation for fine-grained chunking.
arXiv Detail & Related papers (2025-05-22T02:22:11Z)
KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing [19.667480064079083]
Existing deep hashing methods rely on abundant training data, leaving the more challenging scenario of low-resource adaptation relatively underexplored.<n>We introduce Class-Calibration LoRA, a novel plug-and-play approach that dynamically constructs low-rank adaptation by leveraging class-level textual knowledge embeddings.<n>Our proposed method, Knowledge- Anchored Low-Resource Adaptation Hashing (KALAHash), significantly boosts retrieval performance and achieves a 4x data efficiency in low-resource scenarios.
arXiv Detail & Related papers (2024-12-27T03:04:54Z)
Learning to Hash for Recommendation: A Survey [49.943390288789494]
This survey provides a comprehensive overview of state-of-the-art HashRec algorithms.<n>We categorize existing works into a three-tier taxonomy based on: (i) learning objectives, (ii) optimization strategies, and (iii) recommendation scenarios.
arXiv Detail & Related papers (2024-12-05T05:07:19Z)
A Lower Bound of Hash Codes' Performance [122.88252443695492]
In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance. We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization. By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26.5%$ increase in mean Average Precision and an up to $20.5%$ increase in accuracy.
arXiv Detail & Related papers (2022-10-12T03:30:56Z)
CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing [42.67510119856105]
generative models, such as Generative Adversarial Networks (GANs), can generate synthetic data in an image hashing model. GANs are difficult to train, which prevents hashing approaches from jointly training the generative models and the hash functions. We propose a novel framework, the generative cooperative hashing network, which is based on energy-based cooperative learning.
arXiv Detail & Related papers (2022-10-09T15:42:36Z)
Unsupervised Hashing with Contrastive Information Bottleneck [39.607741586731336]
We propose to adapt a framework to learn binary hashing codes. Specifically, we first propose to modify the objective function to meet the specific requirement of hashing. We then introduce a probabilistic binary representation layer into the model to facilitate end-to-end training.
arXiv Detail & Related papers (2021-05-13T08:30:16Z)
CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON) First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)
Deep Reinforcement Learning with Label Embedding Reward for Supervised Image Hashing [85.84690941656528]
We introduce a novel decision-making approach for deep supervised hashing. We learn a deep Q-network with a novel label embedding reward defined by Bose-Chaudhuri-Hocquenghem codes. Our approach outperforms state-of-the-art supervised hashing methods under various code lengths.
arXiv Detail & Related papers (2020-08-10T09:17:20Z)
Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)
Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code. In this study, we propose a novel reinforcing short-length hashing (RSLH) In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information. Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z)
A Survey on Deep Hashing Methods [52.326472103233854]
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries. With the development of deep learning, deep hashing methods show more advantages than traditional methods. Deep supervised hashing is categorized into pairwise methods, ranking-based methods, pointwise methods and quantization. Deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods.
arXiv Detail & Related papers (2020-03-04T08:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.