Stochastic Cluster Embedding
- URL: http://arxiv.org/abs/2108.08003v1
- Date: Wed, 18 Aug 2021 07:07:28 GMT
- Title: Stochastic Cluster Embedding
- Authors: Zhirong Yang, Yuwei Chen, Denis Sedov, Samuel Kaski, and Jukka
Corander
- Abstract summary: Neighbor Embedding (NE) aims to preserve pairwise similarities between data items.
NE methods such as Neighbor Embedding (SNE) may leave large-scale patterns such as clusters hidden.
We propose a new cluster visualization method based on Neighbor Embedding.
- Score: 14.485496311015398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neighbor Embedding (NE) that aims to preserve pairwise similarities between
data items has been shown to yield an effective principle for data
visualization. However, even the currently best NE methods such as Stochastic
Neighbor Embedding (SNE) may leave large-scale patterns such as clusters hidden
despite of strong signals being present in the data. To address this, we
propose a new cluster visualization method based on Neighbor Embedding. We
first present a family of Neighbor Embedding methods which generalizes SNE by
using non-normalized Kullback-Leibler divergence with a scale parameter. In
this family, much better cluster visualizations often appear with a parameter
value different from the one corresponding to SNE. We also develop an efficient
software which employs asynchronous stochastic block coordinate descent to
optimize the new family of objective functions. The experimental results
demonstrate that our method consistently and substantially improves
visualization of data clusters compared with the state-of-the-art NE
approaches.
Related papers
- Revisiting Nonlocal Self-Similarity from Continuous Representation [62.06288797179193]
Nonlocal self-similarity (NSS) is an important prior that has been successfully applied in multi-dimensional data processing tasks.
We propose a novel Continuous Representation-based NonLocal method (termed as CRNL) for both on-meshgrid and off-meshgrid data.
arXiv Detail & Related papers (2024-01-01T09:25:03Z) - Supervised Stochastic Neighbor Embedding Using Contrastive Learning [4.560284382063488]
Clusters of samples belonging to the same class are pulled together in low-dimensional embedding space.
We extend the self-supervised contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information.
arXiv Detail & Related papers (2023-09-15T00:26:21Z) - Learn to Cluster Faces with Better Subgraphs [13.511058277653122]
Face clustering can provide pseudo-labels to the massive unlabeled face data.
Existing clustering methods aggregate features within subgraphs based on a uniform threshold or a learned cutoff position.
This work proposed an efficient neighborhood-aware subgraph adjustment method that can significantly reduce the noise.
arXiv Detail & Related papers (2023-04-21T09:18:55Z) - Multi-View Clustering via Semi-non-negative Tensor Factorization [120.87318230985653]
We develop a novel multi-view clustering based on semi-non-negative tensor factorization (Semi-NTF)
Our model directly considers the between-view relationship and exploits the between-view complementary information.
In addition, we provide an optimization algorithm for the proposed method and prove mathematically that the algorithm always converges to the stationary KKT point.
arXiv Detail & Related papers (2023-03-29T14:54:19Z) - Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors [6.918364447822299]
Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding.
We show that ct-SNE fails in many realistic settings.
We introduce a revised method by conditioning the high-dimensional similarities instead of the low-dimensional similarities.
arXiv Detail & Related papers (2023-02-07T14:37:44Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - Variational Auto Encoder Gradient Clustering [0.0]
Clustering using deep neural network models have been extensively studied in recent years.
This article investigates how probability function gradient ascent can be used to process data in order to achieve better clustering.
We propose a simple yet effective method for investigating suitable number of clusters for data, based on the DBSCAN clustering algorithm.
arXiv Detail & Related papers (2021-05-11T08:00:36Z) - Contrastive Clustering [57.71729650297379]
We propose Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning.
In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19% (39%) performance improvement compared with the best baseline.
arXiv Detail & Related papers (2020-09-21T08:54:40Z) - LSD-C: Linearly Separable Deep Clusters [145.89790963544314]
We present LSD-C, a novel method to identify clusters in an unlabeled dataset.
Our method draws inspiration from recent semi-supervised learning practice and proposes to combine our clustering algorithm with self-supervised pretraining and strong data augmentation.
We show that our approach significantly outperforms competitors on popular public image benchmarks including CIFAR 10/100, STL 10 and MNIST, as well as the document classification dataset Reuters 10K.
arXiv Detail & Related papers (2020-06-17T17:58:10Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.