Unsupervised Space Partitioning for Nearest Neighbor Search
- URL: http://arxiv.org/abs/2206.08091v1
- Date: Thu, 16 Jun 2022 11:17:03 GMT
- Title: Unsupervised Space Partitioning for Nearest Neighbor Search
- Authors: Abrar Fahim, Mohammed Eunus Ali, Muhammad Aamir Cheema
- Abstract summary: We propose an end-to-end learning framework that couples the partitioning and learning-to-search steps using a custom loss function.
A key advantage of our proposed solution is that it does not require any expensive pre-processing of the dataset.
We show that our method beats the state-of-the-art space partitioning method and the ubiquitous K-means clustering method.
- Score: 6.516813715425121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Approximate Nearest Neighbor Search (ANNS) in high dimensional spaces is
crucial for many real-life applications (e.g., e-commerce, web, multimedia,
etc.) dealing with an abundance of data. In this paper, we propose an
end-to-end learning framework that couples the partitioning (one key step of
ANNS) and learning-to-search steps using a custom loss function. A key
advantage of our proposed solution is that it does not require any expensive
pre-processing of the dataset, which is one of the key limitations of the
state-of-the-art approach. We achieve the above edge by formulating a
multi-objective custom loss function that does not need ground truth labels to
quantify the quality of a given partition of the data space, making it entirely
unsupervised. We also propose an ensembling technique by adding varying input
weights to the loss function to train an ensemble of models to enhance the
search quality. On several standard benchmarks for ANNS, we show that our
method beats the state-of-the-art space partitioning method and the ubiquitous
K-means clustering method while using fewer parameters and shorter offline
training times. Without loss of generality, our unsupervised partitioning
approach is shown as a promising alternative to many widely used clustering
methods like K-means clustering and DBSCAN.
Related papers
- Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Spacing Loss for Discovering Novel Categories [72.52222295216062]
Novel Class Discovery (NCD) is a learning paradigm, where a machine learning model is tasked to semantically group instances from unlabeled data.
We first characterize existing NCD approaches into single-stage and two-stage methods based on whether they require access to labeled and unlabeled data together.
We devise a simple yet powerful loss function that enforces separability in the latent space using cues from multi-dimensional scaling.
arXiv Detail & Related papers (2022-04-22T09:37:11Z) - Generalized One-Class Learning Using Pairs of Complementary Classifiers [41.64645294104883]
One-class learning is the classic problem of fitting a model to the data for which annotations are available only for a single class.
In this paper, we explore novel objectives for one-class learning, which we collectively refer to as Generalized One-class Discriminative Subspaces (GODS)
arXiv Detail & Related papers (2021-06-24T18:52:05Z) - Transductive Few-Shot Learning: Clustering is All You Need? [31.21306826132773]
We investigate a general formulation for transive few-shot learning, which integrates prototype-based objectives.
We find that our method yields competitive performances, in term of accuracy and optimization, while scaling up to large problems.
Surprisingly, we find that our general model already achieve competitive performances in comparison to the state-of-the-art learning.
arXiv Detail & Related papers (2021-06-16T16:14:01Z) - Deep Distribution-preserving Incomplete Clustering with Optimal
Transport [43.0056459311929]
We propose a novel deep incomplete clustering method, named Deep Distribution-preserving Incomplete Clustering with Optimal Transport (DDIC-OT)
The proposed network achieves superior and stable clustering performance improvement against existing state-of-the-art incomplete clustering methods over different missing ratios.
arXiv Detail & Related papers (2021-03-21T15:43:17Z) - ThetA -- fast and robust clustering via a distance parameter [3.0020405188885815]
Clustering is a fundamental problem in machine learning where distance-based approaches have dominated the field for many decades.
We propose a new set of distance threshold methods called Theta-based Algorithms (ThetA)
arXiv Detail & Related papers (2021-02-13T23:16:33Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.