Related papers: Representation Learning for Efficient and Effective Similarity Search and Recommendation

Representation Learning for Efficient and Effective Similarity Search and Recommendation

URL: http://arxiv.org/abs/2109.01815v1
Date: Sat, 4 Sep 2021 08:19:01 GMT
Title: Representation Learning for Efficient and Effective Similarity Search and Recommendation
Authors: Casper Hansen
Abstract summary: This thesis makes contributions to representation learning that improve effectiveness of hash codes through more expressive representations and a more effective similarity measure. The contributions are empirically validated on several tasks related to similarity search and recommendation.
Score: 6.280255585012339
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How data is represented and operationalized is critical for building computational solutions that are both effective and efficient. A common approach is to represent data objects as binary vectors, denoted \textit{hash codes}, which require little storage and enable efficient similarity search through direct indexing into a hash table or through similarity computations in an appropriate space. Due to the limited expressibility of hash codes, compared to real-valued representations, a core open challenge is how to generate hash codes that well capture semantic content or latent properties using a small number of bits, while ensuring that the hash codes are distributed in a way that does not reduce their search efficiency. State of the art methods use representation learning for generating such hash codes, focusing on neural autoencoder architectures where semantics are encoded into the hash codes by learning to reconstruct the original inputs of the hash codes. This thesis addresses the above challenge and makes a number of contributions to representation learning that (i) improve effectiveness of hash codes through more expressive representations and a more effective similarity measure than the current state of the art, namely the Hamming distance, and (ii) improve efficiency of hash codes by learning representations that are especially suited to the choice of search method. The contributions are empirically validated on several tasks related to similarity search and recommendation.

Related papers

SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing [83.35231185111464]
Deep learning has shifted the retrieval paradigm from lexical-based matching to encode source code and queries into vector representations. Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates. We propose a novel approach, which converts long hash codes calculated by existing deep hashing approaches into several short hash code segments through an iterative training strategy.
arXiv Detail & Related papers (2024-12-16T12:51:35Z)
Unified Functional Hashing in Automatic Machine Learning [58.77232199682271]
We show that large efficiency gains can be obtained by employing a fast unified functional hash. Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently. We show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery.
arXiv Detail & Related papers (2023-02-10T18:50:37Z)
A Lower Bound of Hash Codes' Performance [122.88252443695492]
In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance. We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization. By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26.5%$ increase in mean Average Precision and an up to $20.5%$ increase in accuracy.
arXiv Detail & Related papers (2022-10-12T03:30:56Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
Unsupervised Multi-Index Semantic Hashing [23.169142004594434]
We propose an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search. We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33% slower than MISH.
arXiv Detail & Related papers (2021-03-26T13:33:48Z)
Self-supervised asymmetric deep hashing with margin-scalable constraint for image retrieval [3.611160663701664]
We propose a novel self-supervised asymmetric deep hashing method with a margin-scalable constraint(SADH) approach for image retrieval. SADH implements a self-supervised network to preserve semantic information in a semantic feature map and a semantic code map for the semantics of the given dataset. For the feature learning part, a new margin-scalable constraint is employed for both highly-accurate construction of pairwise correlations in the hamming space and a more discriminative hash code representation.
arXiv Detail & Related papers (2020-12-07T16:09:37Z)
CIMON: Towards High-quality Hash Codes [63.37321228830102]
We propose a new method named textbfComprehensive stextbfImilarity textbfMining and ctextbfOnsistency leartextbfNing (CIMON) First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes.
arXiv Detail & Related papers (2020-10-15T14:47:14Z)
Unsupervised Semantic Hashing with Pairwise Reconstruction [22.641786533525245]
We present Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.
arXiv Detail & Related papers (2020-07-01T10:54:27Z)
Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code. In this study, we propose a novel reinforcing short-length hashing (RSLH) In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information. Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z)
A Survey on Deep Hashing Methods [52.326472103233854]
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries. With the development of deep learning, deep hashing methods show more advantages than traditional methods. Deep supervised hashing is categorized into pairwise methods, ranking-based methods, pointwise methods and quantization. Deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods.
arXiv Detail & Related papers (2020-03-04T08:25:15Z)
Image Hashing by Minimizing Discrete Component-wise Wasserstein Distance [12.968141477410597]
Adversarial autoencoders are shown to be able to implicitly learn a robust, locality-preserving hash function that generates balanced and high-quality hash codes. The existing adversarial hashing methods are inefficient to be employed for large-scale image retrieval applications. We propose a new adversarial-autoencoder hashing approach that has a much lower sample requirement and computational cost.
arXiv Detail & Related papers (2020-02-29T00:22:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.