Image Hashing via Cross-View Code Alignment in the Age of Foundation Models
- URL: http://arxiv.org/abs/2510.27584v2
- Date: Mon, 03 Nov 2025 10:21:43 GMT
- Title: Image Hashing via Cross-View Code Alignment in the Age of Foundation Models
- Authors: Ilyass Moummad, Kawtar Zaher, Hervé Goëau, Alexis Joly,
- Abstract summary: COCOVCA (Cross-View Code Alignment) is a simple and unified principle for learning binary codes that remain consistent across semantically aligned views.<n>HashCoder is a lightweight hashing network with a final batch normalization layer to enforce balanced codes.<n>CroVCA achieves state-of-the-art results in just 5 training epochs.
- Score: 3.33876524834826
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Efficient large-scale retrieval requires representations that are both compact and discriminative. Foundation models provide powerful visual and multimodal embeddings, but nearest neighbor search in these high-dimensional spaces is computationally expensive. Hashing offers an efficient alternative by enabling fast Hamming distance search with binary codes, yet existing approaches often rely on complex pipelines, multi-term objectives, designs specialized for a single learning paradigm, and long training times. We introduce CroVCA (Cross-View Code Alignment), a simple and unified principle for learning binary codes that remain consistent across semantically aligned views. A single binary cross-entropy loss enforces alignment, while coding-rate maximization serves as an anti-collapse regularizer to promote balanced and diverse codes. To implement this, we design HashCoder, a lightweight MLP hashing network with a final batch normalization layer to enforce balanced codes. HashCoder can be used as a probing head on frozen embeddings or to adapt encoders efficiently via LoRA fine-tuning. Across benchmarks, CroVCA achieves state-of-the-art results in just 5 training epochs. At 16 bits, it particularly well-for instance, unsupervised hashing on COCO completes in under 2 minutes and supervised hashing on ImageNet100 in about 3 minutes on a single GPU. These results highlight CroVCA's efficiency, adaptability, and broad applicability.
Related papers
- Nested Hash Layer: A Plug-and-play Module for Multiple-length Hash Code Learning [61.095479786194836]
Nested Hash Layer (NHL) is a plug-and-play module for deep supervised hashing models.<n>NHL generates hash codes of multiple lengths simultaneously in a nested structure.<n>NHL achieves an overall training speed improvement of approximately 5 to 8 times across various deep supervised hashing models.
arXiv Detail & Related papers (2024-12-12T04:13:09Z) - CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing [42.67510119856105]
generative models, such as Generative Adversarial Networks (GANs), can generate synthetic data in an image hashing model.
GANs are difficult to train, which prevents hashing approaches from jointly training the generative models and the hash functions.
We propose a novel framework, the generative cooperative hashing network, which is based on energy-based cooperative learning.
arXiv Detail & Related papers (2022-10-09T15:42:36Z) - DVHN: A Deep Hashing Framework for Large-scale Vehicle Re-identification [5.407157027628579]
We propose a deep hash-based vehicle re-identification framework, dubbed DVHN, which substantially reduces memory usage and promotes retrieval efficiency.
DVHN directly learns discrete compact binary hash codes for each image by jointly optimizing the feature learning network and the hash code generating module.
textbfDVHN of $2048$ bits can achieve 13.94% and 10.21% accuracy improvement in terms of textbfmAP and textbfRank@1 for textbfVehicleID (800) dataset.
arXiv Detail & Related papers (2021-12-09T14:11:27Z) - One Loss for All: Deep Hashing with a Single Cosine Similarity based
Learning Objective [86.48094395282546]
A deep hashing model typically has two main learning objectives: to make the learned binary hash codes discriminative and to minimize a quantization error.
We propose a novel deep hashing model with only a single learning objective.
Our model is highly effective, outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks.
arXiv Detail & Related papers (2021-09-29T14:27:51Z) - MOON: Multi-Hash Codes Joint Learning for Cross-Media Retrieval [30.77157852327981]
Cross-media hashing technique has attracted increasing attention for its high computation efficiency and low storage cost.
We develop a novel Multiple hash cOdes jOint learNing method (MOON) for cross-media retrieval.
arXiv Detail & Related papers (2021-08-17T14:47:47Z) - Deep Reinforcement Learning with Label Embedding Reward for Supervised
Image Hashing [85.84690941656528]
We introduce a novel decision-making approach for deep supervised hashing.
We learn a deep Q-network with a novel label embedding reward defined by Bose-Chaudhuri-Hocquenghem codes.
Our approach outperforms state-of-the-art supervised hashing methods under various code lengths.
arXiv Detail & Related papers (2020-08-10T09:17:20Z) - Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings [65.36757931982469]
Image hash codes are produced by binarizing embeddings of convolutional neural networks (CNN) trained for either classification or retrieval.
The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity.
The resulting hash-consistent large margin (HCLM) proxies are shown to encourage saturation of hashing units, thus guaranteeing a small binarization error.
arXiv Detail & Related papers (2020-07-27T23:47:43Z) - Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale.
Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous.
We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.