Fast Locality Sensitive Hashing with Theoretical Guarantee
- URL: http://arxiv.org/abs/2309.15479v1
- Date: Wed, 27 Sep 2023 08:21:38 GMT
- Title: Fast Locality Sensitive Hashing with Theoretical Guarantee
- Authors: Zongyuan Tan, Hongya Wang, Bo Xu, Minjie Luo and Ming Du
- Abstract summary: Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks.
In this paper, we design a simple yet efficient LSH scheme, named FastLSH, under l2 norm.
By combining random sampling and random projection, FastLSH reduces the time complexity from O(n) to O(m) (mn), where n is the data dimensionality and m is the number of sampled dimensions.
Experimental results demonstrate that FastLSH is on par with the state-of-the-arts in terms of answer quality, space occupation and query efficiency,
- Score: 5.635783105833339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Locality-sensitive hashing (LSH) is an effective randomized technique widely
used in many machine learning tasks. The cost of hashing is proportional to
data dimensions, and thus often the performance bottleneck when dimensionality
is high and the number of hash functions involved is large. Surprisingly,
however, little work has been done to improve the efficiency of LSH
computation. In this paper, we design a simple yet efficient LSH scheme, named
FastLSH, under l2 norm. By combining random sampling and random projection,
FastLSH reduces the time complexity from O(n) to O(m) (m<n), where n is the
data dimensionality and m is the number of sampled dimensions. Moreover,
FastLSH has provable LSH property, which distinguishes it from the non-LSH fast
sketches. We conduct comprehensive experiments over a collection of real and
synthetic datasets for the nearest neighbor search task. Experimental results
demonstrate that FastLSH is on par with the state-of-the-arts in terms of
answer quality, space occupation and query efficiency, while enjoying up to 80x
speedup in hash function evaluation. We believe that FastLSH is a promising
alternative to the classic LSH scheme.
Related papers
- DeepLSH: Deep Locality-Sensitive Hash Learning for Fast and Efficient
Near-Duplicate Crash Report Detection [0.29998889086656577]
With real-time streaming bug collection, systems are needed to quickly answer the question: What are the most similar bugs to a new one?, that is, efficiently find near-duplicates.
LSH has not been considered in the crash bucketing literature.
We introduce DeepLSH, a Siamese architecture with an original loss function, that perfectly approximates the locality-sensitivity property.
arXiv Detail & Related papers (2023-10-10T15:26:27Z) - HUSP-SP: Faster Utility Mining on Sequence Data [48.0426095077918]
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity.
We design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP)
Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.
arXiv Detail & Related papers (2022-12-29T10:56:17Z) - Experimental Analysis of Machine Learning Techniques for Finding Search
Radius in Locality Sensitive Hashing [0.9137554315375919]
Locality Sensitive Hashing (LSH) is one of the most popular approximate nearest neighbor search techniques for high-dimensional spaces.
An improved LSH-based index structure, called radius-optimized Locality Sensitive Hashing (roLSH) has been proposed to utilize Machine Learning.
arXiv Detail & Related papers (2022-11-16T18:19:10Z) - A Lower Bound of Hash Codes' Performance [122.88252443695492]
In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance.
We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization.
By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26.5%$ increase in mean Average Precision and an up to $20.5%$ increase in accuracy.
arXiv Detail & Related papers (2022-10-12T03:30:56Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing [49.73889315176884]
We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions.
We build the connections between the theory of approximate maximum inner product search and the regret analysis of reinforcement learning.
arXiv Detail & Related papers (2021-05-18T05:23:53Z) - Unsupervised Multi-Index Semantic Hashing [23.169142004594434]
We propose an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing.
We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search.
We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33% slower than MISH.
arXiv Detail & Related papers (2021-03-26T13:33:48Z) - Reinforcing Short-Length Hashing [61.75883795807109]
Existing methods have poor performance in retrieval using an extremely short-length hash code.
In this study, we propose a novel reinforcing short-length hashing (RSLH)
In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information.
Experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
arXiv Detail & Related papers (2020-04-24T02:23:52Z) - Image Hashing by Minimizing Discrete Component-wise Wasserstein Distance [12.968141477410597]
Adversarial autoencoders are shown to be able to implicitly learn a robust, locality-preserving hash function that generates balanced and high-quality hash codes.
The existing adversarial hashing methods are inefficient to be employed for large-scale image retrieval applications.
We propose a new adversarial-autoencoder hashing approach that has a much lower sample requirement and computational cost.
arXiv Detail & Related papers (2020-02-29T00:22:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.