Related papers: Unsupervised Semantic Hashing with Pairwise Reconstruction

Unsupervised Semantic Hashing with Pairwise Reconstruction

URL: http://arxiv.org/abs/2007.00380v1
Date: Wed, 1 Jul 2020 10:54:27 GMT
Title: Unsupervised Semantic Hashing with Pairwise Reconstruction
Authors: Casper Hansen and Christian Hansen and Jakob Grue Simonsen and Stephen Alstrup and Christina Lioma
Abstract summary: We present Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.
Score: 22.641786533525245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.

Related papers

SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing [83.35231185111464]
Deep learning has shifted the retrieval paradigm from lexical-based matching to encode source code and queries into vector representations. Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates. We propose a novel approach, which converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy.
arXiv Detail & Related papers (2024-12-16T12:51:35Z)
ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery [128.30514851911218]
ConceptHash is a novel method that achieves sub-code level interpretability. In ConceptHash, each sub-code corresponds to a human-understandable concept, such as an object part. We incorporate language guidance to ensure that the learned hash codes are distinguishable within fine-grained object classes.
arXiv Detail & Related papers (2024-06-12T17:49:26Z)
RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval [32.06421737874828]
Reconstruction Relations Embedded Hashing (RREH) is designed for semi-paired cross-modal retrieval tasks. RREH assumes that multi-modal data share a common subspace. anchors are sampled from paired data, which improves the efficiency of hash learning.
arXiv Detail & Related papers (2024-05-28T03:12:54Z)
Learning to Hash Naturally Sorts [84.90210592082829]
We introduce Naturally-Sorted Hashing (NSH) to train a deep hashing model with sorted results end-to-end. NSH sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training. We describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning.
arXiv Detail & Related papers (2022-01-31T16:19:02Z)
Representation Learning for Efficient and Effective Similarity Search and Recommendation [6.280255585012339]
This thesis makes contributions to representation learning that improve effectiveness of hash codes through more expressive representations and a more effective similarity measure. The contributions are empirically validated on several tasks related to similarity search and recommendation.
arXiv Detail & Related papers (2021-09-04T08:19:01Z)
Unsupervised Multi-Index Semantic Hashing [23.169142004594434]
We propose an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search. We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33% slower than MISH.
arXiv Detail & Related papers (2021-03-26T13:33:48Z)
CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON) First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)
Generative Semantic Hashing Enhanced via Boltzmann Machines [61.688380278649056]
Existing generative-hashing methods mostly assume a factorized form for the posterior distribution. We propose to employ the distribution of Boltzmann machine as the retrievalal posterior. We show that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
arXiv Detail & Related papers (2020-06-16T01:23:39Z)
Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval. We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing. This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z)
Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code. In this study, we propose a novel reinforcing short-length hashing (RSLH) In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information. Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z)
A Novel Incremental Cross-Modal Hashing Approach [21.99741793652628]
We propose a novel incremental cross-modal hashing algorithm termed "iCMH" The proposed approach consists of two sequential stages, namely, learning the hash codes and training the hash functions. Experiments across a variety of cross-modal datasets and comparisons with state-of-the-art cross-modal algorithms shows the usefulness of our approach.
arXiv Detail & Related papers (2020-02-03T12:34:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.