Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective
- URL: http://arxiv.org/abs/2410.16608v1
- Date: Tue, 22 Oct 2024 01:40:43 GMT
- Title: Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective
- Authors: Zhexuan Liu, Rong Ma, Yiqiao Zhong,
- Abstract summary: Neighbor embedding methods, such as t-SNE, UMAP, and LargeVis, are a family of popular visualization methods.
Recent studies suggest that these methods often produce visual artifacts, potentially leading to incorrect scientific conclusions.
We introduce a novel conceptual and computational framework, LOO-map, that learns the embedding maps based on a classical statistical idea.
- Score: 2.969441406380581
- License:
- Abstract: Visualizing high-dimensional data is an important routine for understanding biomedical data and interpreting deep learning models. Neighbor embedding methods, such as t-SNE, UMAP, and LargeVis, among others, are a family of popular visualization methods which reduce high-dimensional data to two dimensions. However, recent studies suggest that these methods often produce visual artifacts, potentially leading to incorrect scientific conclusions. Recognizing that the current limitation stems from a lack of data-independent notions of embedding maps, we introduce a novel conceptual and computational framework, LOO-map, that learns the embedding maps based on a classical statistical idea known as the leave-one-out. LOO-map extends the embedding over a discrete set of input points to the entire input space, enabling a systematic assessment of map continuity, and thus the reliability of the visualizations. We find for many neighbor embedding methods, their embedding maps can be intrinsically discontinuous. The discontinuity induces two types of observed map distortion: ``overconfidence-inducing discontinuity," which exaggerates cluster separation, and ``fracture-inducing discontinuity," which creates spurious local structures. Building upon LOO-map, we propose two diagnostic point-wise scores -- perturbation score and singularity score -- to address these limitations. These scores can help identify unreliable embedding points, detect out-of-distribution data, and guide hyperparameter selection. Our approach is flexible and works as a wrapper around many neighbor embedding algorithms. We test our methods across multiple real-world datasets from computer vision and single-cell omics to demonstrate their effectiveness in enhancing the interpretability and accuracy of visualizations.
Related papers
- Interpreting Object-level Foundation Models via Visual Precision Search [53.807678972967224]
We propose a Visual Precision Search method that generates accurate attribution maps with fewer regions.
Our method bypasses internal model parameters to overcome attribution issues from multimodal fusion.
Our method can interpret failures in visual grounding and object detection tasks, surpassing existing methods across multiple evaluation metrics.
arXiv Detail & Related papers (2024-11-25T08:54:54Z) - Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Supervised Manifold Learning via Random Forest Geometry-Preserving
Proximities [0.0]
We show the weaknesses of class-conditional manifold learning methods quantitatively and visually.
We propose an alternate choice of kernel for supervised dimensionality reduction using a data-geometry-preserving variant of random forest proximities.
arXiv Detail & Related papers (2023-07-03T14:55:11Z) - Vacant Holes for Unsupervised Detection of the Outliers in Compact
Latent Representation [0.6091702876917279]
Detection of the outliers is pivotal for any machine learning model deployed and operated in real-world.
In this work, we concentrate on the specific type of these models: Variational Autoencoders (VAEs)
arXiv Detail & Related papers (2023-06-16T06:21:48Z) - Focus for Free in Density-Based Counting [56.961229110268036]
We introduce two methods that repurpose the available point annotations to enhance counting performance.
The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images.
The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds.
arXiv Detail & Related papers (2023-06-08T11:54:37Z) - Hyperbolic Self-supervised Contrastive Learning Based Network Anomaly
Detection [0.0]
Anomaly detection on the attributed network has recently received increasing attention in many research fields.
We propose an efficient anomaly detection framework using hyperbolic self-supervised contrastive learning.
arXiv Detail & Related papers (2022-09-12T07:08:34Z) - Generating detailed saliency maps using model-agnostic methods [0.0]
We focus on a model-agnostic explainability method called RISE, elaborate on observed shortcomings of its grid-based approach.
modifications, collectively called VRISE (Voronoi-RISE), are meant to, respectively, improve the accuracy of maps generated using large occlusions.
We compare accuracy of saliency maps produced by VRISE and RISE on the validation split of ILSVRC2012, using a saliency-guided content insertion/deletion metric and a localization metric based on bounding boxes.
arXiv Detail & Related papers (2022-09-04T21:34:46Z) - Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings.
Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Holistic Guidance for Occluded Person Re-Identification [7.662745552551165]
In real-world video surveillance applications, person re-identification (ReID) suffers from the effects of occlusions and detection errors.
We introduce a novel Holistic Guidance (HG) method that relies only on person identity labels.
Our proposed student-teacher framework is trained to address the problem by matching the distributions of between- and within-class distances (DCDs) of occluded samples with that of holistic (non-occluded) samples.
In addition to this, a joint generative-discriminative backbone is trained with a denoising autoencoder, allowing the system to
arXiv Detail & Related papers (2021-04-13T21:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.