Unsupervised Semantic Hashing with Pairwise Reconstruction
- URL: http://arxiv.org/abs/2007.00380v1
- Date: Wed, 1 Jul 2020 10:54:27 GMT
- Title: Unsupervised Semantic Hashing with Pairwise Reconstruction
- Authors: Casper Hansen and Christian Hansen and Jakob Grue Simonsen and Stephen
Alstrup and Christina Lioma
- Abstract summary: We present Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model.
We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.
- Score: 22.641786533525245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic Hashing is a popular family of methods for efficient similarity
search in large-scale datasets. In Semantic Hashing, documents are encoded as
short binary vectors (i.e., hash codes), such that semantic similarity can be
efficiently computed using the Hamming distance. Recent state-of-the-art
approaches have utilized weak supervision to train better performing hashing
models. Inspired by this, we present Semantic Hashing with Pairwise
Reconstruction (PairRec), which is a discrete variational autoencoder based
hashing model. PairRec first encodes weakly supervised training pairs (a query
document and a semantically similar document) into two hash codes, and then
learns to reconstruct the same query document from both of these hash codes
(i.e., pairwise reconstruction). This pairwise reconstruction enables our model
to encode local neighbourhood structures within the hash code directly through
the decoder. We experimentally compare PairRec to traditional and
state-of-the-art approaches, and obtain significant performance improvements in
the task of document similarity search.
Related papers
- ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery [128.30514851911218]
ConceptHash is a novel method that achieves sub-code level interpretability.
In ConceptHash, each sub-code corresponds to a human-understandable concept, such as an object part.
We incorporate language guidance to ensure that the learned hash codes are distinguishable within fine-grained object classes.
arXiv Detail & Related papers (2024-06-12T17:49:26Z) - RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval [32.06421737874828]
Reconstruction Relations Embedded Hashing (RREH) is designed for semi-paired cross-modal retrieval tasks.
RREH assumes that multi-modal data share a common subspace.
anchors are sampled from paired data, which improves the efficiency of hash learning.
arXiv Detail & Related papers (2024-05-28T03:12:54Z) - Learning to Hash Naturally Sorts [84.90210592082829]
We introduce Naturally-Sorted Hashing (NSH) to train a deep hashing model with sorted results end-to-end.
NSH sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training.
We describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning.
arXiv Detail & Related papers (2022-01-31T16:19:02Z) - Representation Learning for Efficient and Effective Similarity Search
and Recommendation [6.280255585012339]
This thesis makes contributions to representation learning that improve effectiveness of hash codes through more expressive representations and a more effective similarity measure.
The contributions are empirically validated on several tasks related to similarity search and recommendation.
arXiv Detail & Related papers (2021-09-04T08:19:01Z) - Unsupervised Multi-Index Semantic Hashing [23.169142004594434]
We propose an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing.
We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search.
We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33% slower than MISH.
arXiv Detail & Related papers (2021-03-26T13:33:48Z) - CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON)
First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z) - Generative Semantic Hashing Enhanced via Boltzmann Machines [61.688380278649056]
Existing generative-hashing methods mostly assume a factorized form for the posterior distribution.
We propose to employ the distribution of Boltzmann machine as the retrievalal posterior.
We show that by effectively modeling correlations among different bits within a hash code, our model can achieve significant performance gains.
arXiv Detail & Related papers (2020-06-16T01:23:39Z) - Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and
Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval.
We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing.
This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z) - Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code.
In this study, we propose a novel reinforcing short-length hashing (RSLH)
In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information.
Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z) - A Novel Incremental Cross-Modal Hashing Approach [21.99741793652628]
We propose a novel incremental cross-modal hashing algorithm termed "iCMH"
The proposed approach consists of two sequential stages, namely, learning the hash codes and training the hash functions.
Experiments across a variety of cross-modal datasets and comparisons with state-of-the-art cross-modal algorithms shows the usefulness of our approach.
arXiv Detail & Related papers (2020-02-03T12:34:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.