SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings
- URL: http://arxiv.org/abs/2112.05872v1
- Date: Sat, 11 Dec 2021 00:10:05 GMT
- Title: SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings
- Authors: Yuzhe Lu, Xinran Liu, Andrea Soltoggio, Soheil Kolouri
- Abstract summary: This paper focuses on non-parametric and data-independent learning from set-structured data using approximate nearest neighbor (ANN) solutions.
We propose Sliced-Wasserstein set embedding as a computationally efficient "set-2-vector" mechanism that enables downstream ANN.
We demonstrate the effectiveness of our algorithm, denoted as Set-LOcality Sensitive Hashing (SLOSH), on various set retrieval datasets.
- Score: 18.916058638077274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from set-structured data is an essential problem with many
applications in machine learning and computer vision. This paper focuses on
non-parametric and data-independent learning from set-structured data using
approximate nearest neighbor (ANN) solutions, particularly locality-sensitive
hashing. We consider the problem of set retrieval from an input set query. Such
retrieval problem requires: 1) an efficient mechanism to calculate the
distances/dissimilarities between sets, and 2) an appropriate data structure
for fast nearest neighbor search. To that end, we propose Sliced-Wasserstein
set embedding as a computationally efficient "set-2-vector" mechanism that
enables downstream ANN, with theoretical guarantees. The set elements are
treated as samples from an unknown underlying distribution, and the
Sliced-Wasserstein distance is used to compare sets. We demonstrate the
effectiveness of our algorithm, denoted as Set-LOcality Sensitive Hashing
(SLOSH), on various set retrieval datasets and compare our proposed embedding
with standard set embedding approaches, including Generalized Mean (GeM)
embedding/pooling, Featurewise Sort Pooling (FSPool), and Covariance Pooling
and show consistent improvement in retrieval results. The code for replicating
our results is available here:
\href{https://github.com/mint-vu/SLOSH}{https://github.com/mint-vu/SLOSH}.
Related papers
- DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking [58.37644304554906]
We propose Data Retrieval with Error-corrected codes and Watermarking (DREW)
DREW randomly clusters the reference dataset and injects unique error-controlled watermark keys into each cluster.
After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches.
arXiv Detail & Related papers (2024-06-05T01:19:44Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Hashing Learning with Hyper-Class Representation [8.206031417113987]
Existing unsupervised hash learning is a kind of attribute-centered calculation.
It may not accurately preserve the similarity between data.
In this paper, a hash algorithm is proposed with a hyper-class representation.
arXiv Detail & Related papers (2022-06-06T03:35:45Z) - Learning to Hash Naturally Sorts [84.90210592082829]
We introduce Naturally-Sorted Hashing (NSH) to train a deep hashing model with sorted results end-to-end.
NSH sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training.
We describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning.
arXiv Detail & Related papers (2022-01-31T16:19:02Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Partial Wasserstein Covering [10.52782170493037]
We consider a general task called partial Wasserstein covering with the goal of emulating a large dataset.
We model this problem as a discrete optimization problem with partial Wasserstein divergence as an objective function.
We show that we can efficiently make two datasets similar in terms of partial Wasserstein divergence, including driving scene datasets.
arXiv Detail & Related papers (2021-06-02T01:48:41Z) - Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and
Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval.
We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing.
This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z) - LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set
Similarity Under Skew [58.21885402826496]
All-pairs set similarity is a widely used data mining task, even for large and high-dimensional datasets.
We present a new distributed algorithm, LSF-Join, for approximate all-pairs set similarity.
We show that LSF-Join efficiently finds most close pairs, even for small similarity thresholds and for skewed input sets.
arXiv Detail & Related papers (2020-03-06T00:06:20Z) - AMP Chain Graphs: Minimal Separators and Structure Learning Algorithms [2.3333090554192615]
We address the problem of finding a minimal separator in an Andersson-Madigan-Perlman chain graph (AMP CG)
We show that the PC-like algorithm (Pena, 2012) is order-dependent, in the sense that the output can depend on the order in which the variables are given.
We propose several modifications of the PC-like algorithm that remove part or all of this order-dependence.
arXiv Detail & Related papers (2020-02-24T18:14:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.