Related papers: Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections

Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections

URL: http://arxiv.org/abs/2207.10800v1
Date: Tue, 12 Jul 2022 20:30:33 GMT
Title: Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections
Authors: Haseeb Younis, Paul Trust, Rosane Minghim
Abstract summary: Two of the relevant algorithms in the data visualisation field are t-distributed neighbourhood embedding (t-SNE) and Least-Square Projection (LSP) These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. We illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data visualisation helps understanding data represented by multiple variables, also called features, stored in a large matrix where individuals are stored in lines and variable values in columns. These data structures are frequently called multidimensional spaces.In this paper, we illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework. Some of the common mathematical common to these approaches are Laplacian matrices, Euclidian distance, Cosine distance, and statistical methods such as Kullback-Leibler divergence, employed to fit probability distributions and reduce dimensions. Two of the relevant algorithms in the data visualisation field are t-distributed stochastic neighbourhood embedding (t-SNE) and Least-Square Projection (LSP). These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. In this article, mathematical parameters of underlying techniques such as Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods behind LSP are adjusted to reflect the properties afforded by the mathematical formulation. The results, supported by illustrative methods of the processes of LSP and t-SNE, are meant to inspire students in understanding the mathematics behind such methods, in order to apply them in effective data analysis tasks in multiple applications.

Related papers

Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence. To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus. We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams [65.02628814094639]
Diagrams serve as a fundamental form of visual language, representing complex concepts and their inter-relationships through structured symbols, shapes, and spatial arrangements. Current benchmarks conflate perceptual and reasoning tasks, making it difficult to assess whether Multimodal Large Language Models genuinely understand mathematical diagrams beyond superficial pattern recognition. We introduce MATHGLANCE, a benchmark specifically designed to isolate and evaluate mathematical perception in MLLMs. We construct GeoPeP, a perception-oriented dataset of 200K structured geometry image-text annotated with geometric primitives and precise spatial relationships.
arXiv Detail & Related papers (2025-03-26T17:30:41Z)
Merging Hazy Sets with m-Schemes: A Geometric Approach to Data Visualization [0.09320657506524149]
We introduce a framework for aggregating dissimilarity functions that arise from locally adjusting a metric through density-aware normalization. We formalize these approaches as m-schemes, a class of methods closely related to t-norms and t-conorms in probabilistic metrics.
arXiv Detail & Related papers (2025-03-03T15:40:08Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework [0.0]
This paper explores the steps involved in multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive.
arXiv Detail & Related papers (2023-09-05T17:56:22Z)
Linearized Wasserstein dimensionality reduction with approximation guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z)
Geometry of EM and related iterative algorithms [8.228889210180268]
The Expectation--Maximization (EM) algorithm is a simple meta-algorithm that has been used for many years as a methodology for statistical inference. In this paper, we introduce the $em$ algorithm, an information geometric formulation of the EM algorithm, and its extensions and applications to various problems.
arXiv Detail & Related papers (2022-09-03T00:23:23Z)
Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding. We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z)
CCP: Correlated Clustering and Projection for Dimensionality Reduction [5.992724190105578]
Correlated Clustering and Projection offers a novel data domain strategy that does not need to solve any matrix. CCP partitions high-dimensional features into correlated clusters and then projects correlated features in each cluster into a one-dimensional representation. Proposed methods are validated with benchmark datasets associated with various machine learning algorithms.
arXiv Detail & Related papers (2022-06-08T23:14:44Z)
UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data [63.74032987144699]
We present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system.
arXiv Detail & Related papers (2021-11-02T17:11:57Z)
Learning Log-Determinant Divergences for Positive Definite Matrices [47.61701711840848]
In this paper, we propose to learn similarity measures in a data-driven manner. We capitalize on the alphabeta-log-det divergence, which is a meta-divergence parametrized by scalars alpha and beta. Our key idea is to cast these parameters in a continuum and learn them from data.
arXiv Detail & Related papers (2021-04-13T19:09:43Z)
Probabilistic Learning Vector Quantization on Manifold of Symmetric Positive Definite Matrices [3.727361969017079]
We develop a new classification method for manifold-valued data in the framework of probabilistic learning vector quantization. In this paper, we generalize the probabilistic learning vector quantization algorithm for data points living on the manifold of symmetric positive definite matrices. Empirical investigations on synthetic data, image data, and motor imagery EEG data demonstrate the superior performance of the proposed method.
arXiv Detail & Related papers (2021-02-01T06:58:39Z)
Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF. It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian. The graph reasoning is directly performed in the original feature space organized as a spatial pyramid. We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.