Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors
- URL: http://arxiv.org/abs/2302.03493v2
- Date: Tue, 11 Apr 2023 08:36:09 GMT
- Title: Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors
- Authors: Edith Heiter, Bo Kang, Ruth Seurinck, Jefrey Lijffijt
- Abstract summary: Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding.
We show that ct-SNE fails in many realistic settings.
We introduce a revised method by conditioning the high-dimensional similarities instead of the low-dimensional similarities.
- Score: 6.918364447822299
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal
of known cluster information from the embedding, to obtain a visualization
revealing structure beyond label information. This is useful, for example, when
one wants to factor out unwanted differences between a set of classes. We show
that ct-SNE fails in many realistic settings, namely if the data is well
clustered over the labels in the original high-dimensional space. We introduce
a revised method by conditioning the high-dimensional similarities instead of
the low-dimensional similarities and storing within- and across-label nearest
neighbors separately. This also enables the use of recently proposed speedups
for t-SNE, improving the scalability. From experiments on synthetic data, we
find that our proposed method resolves the considered problems and improves the
embedding quality. On real data containing batch effects, the expected
improvement is not always there. We argue revised ct-SNE is preferable overall,
given its improved scalability. The results also highlight new open questions,
such as how to handle distance variations between clusters.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Contrastive Continual Multi-view Clustering with Filtered Structural
Fusion [57.193645780552565]
Multi-view clustering thrives in applications where views are collected in advance.
It overlooks scenarios where data views are collected sequentially, i.e., real-time data.
Some methods are proposed to handle it but are trapped in a stability-plasticity dilemma.
We propose Contrastive Continual Multi-view Clustering with Filtered Structural Fusion.
arXiv Detail & Related papers (2023-09-26T14:18:29Z) - Supervised Stochastic Neighbor Embedding Using Contrastive Learning [4.560284382063488]
Clusters of samples belonging to the same class are pulled together in low-dimensional embedding space.
We extend the self-supervised contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information.
arXiv Detail & Related papers (2023-09-15T00:26:21Z) - CEnt: An Entropy-based Model-agnostic Explainability Framework to
Contrast Classifiers' Decisions [2.543865489517869]
We present a novel approach to locally contrast the prediction of any classifier.
Our Contrastive Entropy-based explanation method, CEnt, approximates a model locally by a decision tree to compute entropy information of different feature splits.
CEnt is the first non-gradient-based contrastive method generating diverse counterfactuals that do not necessarily exist in the training data while satisfying immutability (ex. race) and semi-immutability (ex. age can only change in an increasing direction)
arXiv Detail & Related papers (2023-01-19T08:23:34Z) - Combating Mode Collapse in GANs via Manifold Entropy Estimation [70.06639443446545]
Generative Adversarial Networks (GANs) have shown compelling results in various tasks and applications.
We propose a novel training pipeline to address the mode collapse issue of GANs.
arXiv Detail & Related papers (2022-08-25T12:33:31Z) - Active Learning by Feature Mixing [52.16150629234465]
We propose a novel method for batch active learning called ALFA-Mix.
We identify unlabelled instances with sufficiently-distinct features by seeking inconsistencies in predictions.
We show that inconsistencies in these predictions help discovering features that the model is unable to recognise in the unlabelled instances.
arXiv Detail & Related papers (2022-03-14T12:20:54Z) - T-SNE Is Not Optimized to Reveal Clusters in Data [4.03823460330412]
Cluster visualization is an essential task for nonlinear dimensionality reduction as a data analysis tool.
It is often believed that Student t-Distributed Neighbor Embedding (t-SNE) can show clusters for well clusterable data.
We show that t-SNE may leave clustering patterns hidden despite strong signals present in the data.
arXiv Detail & Related papers (2021-10-06T08:35:39Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - Stochastic Cluster Embedding [14.485496311015398]
Neighbor Embedding (NE) aims to preserve pairwise similarities between data items.
NE methods such as Neighbor Embedding (SNE) may leave large-scale patterns such as clusters hidden.
We propose a new cluster visualization method based on Neighbor Embedding.
arXiv Detail & Related papers (2021-08-18T07:07:28Z) - Hard-label Manifolds: Unexpected Advantages of Query Efficiency for
Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives.
It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors.
We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z) - Classify and Generate Reciprocally: Simultaneous Positive-Unlabelled
Learning and Conditional Generation with Extra Data [77.31213472792088]
The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems.
We address this problem by leveraging Positive-Unlabeled(PU) classification and the conditional generation with extra unlabeled data.
We present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.