Related papers: SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings

SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings

URL: http://arxiv.org/abs/2112.05872v1
Date: Sat, 11 Dec 2021 00:10:05 GMT
Title: SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings
Authors: Yuzhe Lu, Xinran Liu, Andrea Soltoggio, Soheil Kolouri
Abstract summary: This paper focuses on non-parametric and data-independent learning from set-structured data using approximate nearest neighbor (ANN) solutions. We propose Sliced-Wasserstein set embedding as a computationally efficient "set-2-vector" mechanism that enables downstream ANN. We demonstrate the effectiveness of our algorithm, denoted as Set-LOcality Sensitive Hashing (SLOSH), on various set retrieval datasets.
Score: 18.916058638077274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning from set-structured data is an essential problem with many applications in machine learning and computer vision. This paper focuses on non-parametric and data-independent learning from set-structured data using approximate nearest neighbor (ANN) solutions, particularly locality-sensitive hashing. We consider the problem of set retrieval from an input set query. Such retrieval problem requires: 1) an efficient mechanism to calculate the distances/dissimilarities between sets, and 2) an appropriate data structure for fast nearest neighbor search. To that end, we propose Sliced-Wasserstein set embedding as a computationally efficient "set-2-vector" mechanism that enables downstream ANN, with theoretical guarantees. The set elements are treated as samples from an unknown underlying distribution, and the Sliced-Wasserstein distance is used to compare sets. We demonstrate the effectiveness of our algorithm, denoted as Set-LOcality Sensitive Hashing (SLOSH), on various set retrieval datasets and compare our proposed embedding with standard set embedding approaches, including Generalized Mean (GeM) embedding/pooling, Featurewise Sort Pooling (FSPool), and Covariance Pooling and show consistent improvement in retrieval results. The code for replicating our results is available here: \href{https://github.com/mint-vu/SLOSH}{https://github.com/mint-vu/SLOSH}.

Related papers

Hashing for Fast Pattern Set Selection [1.2430809884830318]
Pattern set mining is the task of finding a good set of patterns instead of all patterns.<n>We show that our hashing-based approach is significantly faster than the standard greedy algorithm.
arXiv Detail & Related papers (2025-07-11T16:55:25Z)
DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking [58.37644304554906]
We propose Data Retrieval with Error-corrected codes and Watermarking (DREW) DREW randomly clusters the reference dataset and injects unique error-controlled watermark keys into each cluster. After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches.
arXiv Detail & Related papers (2024-06-05T01:19:44Z)
Linearized Wasserstein dimensionality reduction with approximation guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z)
Hashing Learning with Hyper-Class Representation [8.206031417113987]
Existing unsupervised hash learning is a kind of attribute-centered calculation. It may not accurately preserve the similarity between data. In this paper, a hash algorithm is proposed with a hyper-class representation.
arXiv Detail & Related papers (2022-06-06T03:35:45Z)
Learning to Hash Naturally Sorts [84.90210592082829]
We introduce Naturally-Sorted Hashing (NSH) to train a deep hashing model with sorted results end-to-end. NSH sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training. We describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning.
arXiv Detail & Related papers (2022-01-31T16:19:02Z)
Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z)
Partial Wasserstein Covering [10.52782170493037]
We consider a general task called partial Wasserstein covering with the goal of emulating a large dataset. We model this problem as a discrete optimization problem with partial Wasserstein divergence as an objective function. We show that we can efficiently make two datasets similar in terms of partial Wasserstein divergence, including driving scene datasets.
arXiv Detail & Related papers (2021-06-02T01:48:41Z)
Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval. We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing. This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z)
LSF-Join: Locality Sensitive Filtering for Distributed All-Pairs Set Similarity Under Skew [58.21885402826496]
All-pairs set similarity is a widely used data mining task, even for large and high-dimensional datasets. We present a new distributed algorithm, LSF-Join, for approximate all-pairs set similarity. We show that LSF-Join efficiently finds most close pairs, even for small similarity thresholds and for skewed input sets.
arXiv Detail & Related papers (2020-03-06T00:06:20Z)
AMP Chain Graphs: Minimal Separators and Structure Learning Algorithms [2.3333090554192615]
We address the problem of finding a minimal separator in an Andersson-Madigan-Perlman chain graph (AMP CG) We show that the PC-like algorithm (Pena, 2012) is order-dependent, in the sense that the output can depend on the order in which the variables are given. We propose several modifications of the PC-like algorithm that remove part or all of this order-dependence.
arXiv Detail & Related papers (2020-02-24T18:14:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.