Related papers: Learning-Based Hashing for ANN Search: Foundations and Early Advances

Learning-Based Hashing for ANN Search: Foundations and Early Advances

URL: http://arxiv.org/abs/2510.04127v1
Date: Sun, 05 Oct 2025 09:59:56 GMT
Title: Learning-Based Hashing for ANN Search: Foundations and Early Advances
Authors: Sean Moran,
Abstract summary: Hashing-based methods provide an efficient solution by mapping high-dimensional data into compact binary codes.<n>Over the past two decades, a substantial body of work has explored learning to hash, where projection and quantisation functions are optimised from data.<n>This article offers a foundational survey of early learning-based hashing methods, with an emphasis on the core ideas that shaped the field.
Score: 0.5279475826661642
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Approximate Nearest Neighbour (ANN) search is a fundamental problem in information retrieval, underpinning large-scale applications in computer vision, natural language processing, and cross-modal search. Hashing-based methods provide an efficient solution by mapping high-dimensional data into compact binary codes that enable fast similarity computations in Hamming space. Over the past two decades, a substantial body of work has explored learning to hash, where projection and quantisation functions are optimised from data rather than chosen at random. This article offers a foundational survey of early learning-based hashing methods, with an emphasis on the core ideas that shaped the field. We review supervised, unsupervised, and semi-supervised approaches, highlighting how projection functions are designed to generate meaningful embeddings and how quantisation strategies convert these embeddings into binary codes. We also examine extensions to multi-bit and multi-threshold models, as well as early advances in cross-modal retrieval. Rather than providing an exhaustive account of the most recent methods, our goal is to introduce the conceptual foundations of learning-based hashing for ANN search. By situating these early models in their historical context, we aim to equip readers with a structured understanding of the principles, trade-offs, and open challenges that continue to inform current research in this area.

Related papers

A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation [69.50397417361351]
Text hashing projects original texts into compact binary hash codes.<n>Deep text hashing has demonstrated significant advantages over traditional, data-independent hashing techniques.<n>This survey investigates current deep text hashing methods by categorizing them based on their core components.
arXiv Detail & Related papers (2025-10-31T06:51:37Z)
Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.<n>We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z)
CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner. We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z)
Efficient Cross-Modal Retrieval via Deep Binary Hashing and Quantization [5.799838997511804]
Cross-modal retrieval aims to search for data with similar semantic meanings across different content modalities. We propose a jointly learned deep hashing and quantization network (HQ) for cross-modal retrieval. Experimental results on the NUS-WIDE, MIR-Flickr, and Amazon datasets demonstrate that HQ achieves boosts of more than 7% in precision.
arXiv Detail & Related papers (2022-02-15T22:00:04Z)
A Survey of Deep Meta-Learning [1.2891210250935143]
Deep neural networks can achieve great successes when presented with large data sets and sufficient computational resources. However, their ability to learn new concepts quickly is limited. Deep Meta-Learning is one approach to address this issue, by enabling the network to learn how to learn.
arXiv Detail & Related papers (2020-10-07T17:09:02Z)
Unsupervised Deep Cross-modality Spectral Hashing [65.3842441716661]
The framework is a two-step hashing approach which decouples the optimization into binary optimization and hashing function learning. We propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. We leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality.
arXiv Detail & Related papers (2020-08-01T09:20:11Z)
A Survey on Deep Hashing Methods [52.326472103233854]
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries. With the development of deep learning, deep hashing methods show more advantages than traditional methods. Deep supervised hashing is categorized into pairwise methods, ranking-based methods, pointwise methods and quantization. Deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods.
arXiv Detail & Related papers (2020-03-04T08:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.