Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective
- URL: http://arxiv.org/abs/2410.16608v1
- Date: Tue, 22 Oct 2024 01:40:43 GMT
- Title: Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective
- Authors: Zhexuan Liu, Rong Ma, Yiqiao Zhong,
- Abstract summary: Neighbor embedding methods, such as t-SNE, UMAP, and LargeVis, are a family of popular visualization methods.
Recent studies suggest that these methods often produce visual artifacts, potentially leading to incorrect scientific conclusions.
We introduce a novel conceptual and computational framework, LOO-map, that learns the embedding maps based on a classical statistical idea.
- Score: 2.969441406380581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visualizing high-dimensional data is an important routine for understanding biomedical data and interpreting deep learning models. Neighbor embedding methods, such as t-SNE, UMAP, and LargeVis, among others, are a family of popular visualization methods which reduce high-dimensional data to two dimensions. However, recent studies suggest that these methods often produce visual artifacts, potentially leading to incorrect scientific conclusions. Recognizing that the current limitation stems from a lack of data-independent notions of embedding maps, we introduce a novel conceptual and computational framework, LOO-map, that learns the embedding maps based on a classical statistical idea known as the leave-one-out. LOO-map extends the embedding over a discrete set of input points to the entire input space, enabling a systematic assessment of map continuity, and thus the reliability of the visualizations. We find for many neighbor embedding methods, their embedding maps can be intrinsically discontinuous. The discontinuity induces two types of observed map distortion: ``overconfidence-inducing discontinuity," which exaggerates cluster separation, and ``fracture-inducing discontinuity," which creates spurious local structures. Building upon LOO-map, we propose two diagnostic point-wise scores -- perturbation score and singularity score -- to address these limitations. These scores can help identify unreliable embedding points, detect out-of-distribution data, and guide hyperparameter selection. Our approach is flexible and works as a wrapper around many neighbor embedding algorithms. We test our methods across multiple real-world datasets from computer vision and single-cell omics to demonstrate their effectiveness in enhancing the interpretability and accuracy of visualizations.
Related papers
- Interpreting Object-level Foundation Models via Visual Precision Search [53.807678972967224]
We propose a Visual Precision Search method that generates accurate attribution maps with fewer regions.
Our method bypasses internal model parameters to overcome attribution issues from multimodal fusion.
Our method can interpret failures in visual grounding and object detection tasks, surpassing existing methods across multiple evaluation metrics.
arXiv Detail & Related papers (2024-11-25T08:54:54Z) - Dissecting embedding method: learning higher-order structures from data [0.0]
Geometric deep learning methods for data learning often include set of assumptions on the geometry of the feature space.
These assumptions together with data being discrete and finite can cause some generalisations, which are likely to create wrong interpretations of the data and models outputs.
arXiv Detail & Related papers (2024-10-14T08:19:39Z) - Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD)
In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency.
Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z) - Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z) - Neural Semantic Surface Maps [52.61017226479506]
We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another.
Our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement.
arXiv Detail & Related papers (2023-09-09T16:21:56Z) - Supervised Manifold Learning via Random Forest Geometry-Preserving
Proximities [0.0]
We show the weaknesses of class-conditional manifold learning methods quantitatively and visually.
We propose an alternate choice of kernel for supervised dimensionality reduction using a data-geometry-preserving variant of random forest proximities.
arXiv Detail & Related papers (2023-07-03T14:55:11Z) - Vacant Holes for Unsupervised Detection of the Outliers in Compact
Latent Representation [0.6091702876917279]
Detection of the outliers is pivotal for any machine learning model deployed and operated in real-world.
In this work, we concentrate on the specific type of these models: Variational Autoencoders (VAEs)
arXiv Detail & Related papers (2023-06-16T06:21:48Z) - Focus for Free in Density-Based Counting [56.961229110268036]
We introduce two methods that repurpose the available point annotations to enhance counting performance.
The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images.
The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds.
arXiv Detail & Related papers (2023-06-08T11:54:37Z) - Linking data separation, visual separation, and classifier performance
using pseudo-labeling by contrastive learning [125.99533416395765]
We argue that the performance of the final classifier depends on the data separation present in the latent space and visual separation present in the projection.
We demonstrate our results by the classification of five real-world challenging image datasets of human intestinal parasites with only 1% supervised samples.
arXiv Detail & Related papers (2023-02-06T10:01:38Z) - Hyperbolic Self-supervised Contrastive Learning Based Network Anomaly
Detection [0.0]
Anomaly detection on the attributed network has recently received increasing attention in many research fields.
We propose an efficient anomaly detection framework using hyperbolic self-supervised contrastive learning.
arXiv Detail & Related papers (2022-09-12T07:08:34Z) - Generating detailed saliency maps using model-agnostic methods [0.0]
We focus on a model-agnostic explainability method called RISE, elaborate on observed shortcomings of its grid-based approach.
modifications, collectively called VRISE (Voronoi-RISE), are meant to, respectively, improve the accuracy of maps generated using large occlusions.
We compare accuracy of saliency maps produced by VRISE and RISE on the validation split of ILSVRC2012, using a saliency-guided content insertion/deletion metric and a localization metric based on bounding boxes.
arXiv Detail & Related papers (2022-09-04T21:34:46Z) - Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings.
Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Residual Moment Loss for Medical Image Segmentation [56.72261489147506]
Location information is proven to benefit the deep learning models on capturing the manifold structure of target objects.
Most existing methods encode the location information in an implicit way, for the network to learn.
We propose a novel loss function, namely residual moment (RM) loss, to explicitly embed the location information of segmentation targets.
arXiv Detail & Related papers (2021-06-27T09:31:49Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Holistic Guidance for Occluded Person Re-Identification [7.662745552551165]
In real-world video surveillance applications, person re-identification (ReID) suffers from the effects of occlusions and detection errors.
We introduce a novel Holistic Guidance (HG) method that relies only on person identity labels.
Our proposed student-teacher framework is trained to address the problem by matching the distributions of between- and within-class distances (DCDs) of occluded samples with that of holistic (non-occluded) samples.
In addition to this, a joint generative-discriminative backbone is trained with a denoising autoencoder, allowing the system to
arXiv Detail & Related papers (2021-04-13T21:50:29Z) - Contrastive analysis for scatter plot-based representations of
dimensionality reduction [0.0]
This paper introduces a methodology to explore multidimensional datasets and interpret clusters' formation.
We also introduce a bipartite graph to visually interpret and explore the relationship between the statistical variables used to understand how the attributes influenced cluster formation.
arXiv Detail & Related papers (2021-01-26T01:16:31Z) - Label Decoupling Framework for Salient Object Detection [157.96262922808245]
Recent methods mainly focus on aggregating multi-level features from convolutional network (FCN) and introducing edge information as auxiliary supervision.
We propose a label decoupling framework (LDF) which consists of a label decoupling procedure and a feature interaction network (FIN)
Experiments on six benchmark datasets demonstrate that LDF outperforms state-of-the-art approaches on different evaluation metrics.
arXiv Detail & Related papers (2020-08-25T14:23:38Z) - Dimensionality Reduction via Diffusion Map Improved with Supervised
Linear Projection [1.7513645771137178]
In this paper, we assume the data samples lie on a single underlying smooth manifold.
We define intra-class and inter-class similarities using pairwise local kernel distances.
We aim to find a linear projection to maximize the intra-class similarities and minimize the inter-class similarities simultaneously.
arXiv Detail & Related papers (2020-08-08T04:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.