In search of the most efficient and memory-saving visualization of high
dimensional data
- URL: http://arxiv.org/abs/2303.05455v1
- Date: Mon, 27 Feb 2023 20:56:13 GMT
- Title: In search of the most efficient and memory-saving visualization of high
dimensional data
- Authors: Bartosz Minch
- Abstract summary: We argue that the visualization of multidimensional data is well approximated of two-directed embedding of undimensional nearest-neighbor graphs.
Existing reduction methods are too slow and do not allow interactive manipulation.
We show that high-quality embeddings are produced with minimal time and memory complexity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive exploration of large, multidimensional datasets plays a very
important role in various scientific fields. It makes it possible not only to
identify important structural features and forms, such as clusters of vertices
and their connection patterns, but also to evaluate their interrelationships in
terms of position, distance, shape and connection density. We argue that the
visualization of multidimensional data is well approximated by the problem of
two-dimensional embedding of undirected nearest-neighbor graphs. The size of
complex networks is a major challenge for today's computer systems and still
requires more efficient data embedding algorithms. Existing reduction methods
are too slow and do not allow interactive manipulation. We show that
high-quality embeddings are produced with minimal time and memory complexity.
We present very efficient IVHD algorithms (CPU and GPU) and compare them with
the latest and most popular dimensionality reduction methods. We show that the
memory and time requirements are dramatically lower than for base codes. At the
cost of a slight degradation in embedding quality, IVHD preserves the main
structural properties of the data well with a much lower time budget. We also
present a meta-algorithm that allows the use of any unsupervised data embedding
method in a supervised manner.
Related papers
- Deep Manifold Graph Auto-Encoder for Attributed Graph Embedding [51.75091298017941]
This paper proposes a novel Deep Manifold (Variational) Graph Auto-Encoder (DMVGAE/DMGAE) for attributed graph data.
The proposed method surpasses state-of-the-art baseline algorithms by a significant margin on different downstream tasks across popular datasets.
arXiv Detail & Related papers (2024-01-12T17:57:07Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Streaming Encoding Algorithms for Scalable Hyperdimensional Computing [12.829102171258882]
Hyperdimensional computing (HDC) is a paradigm for data representation and learning originating in computational neuroscience.
In this work, we explore a family of streaming encoding techniques based on hashing.
We show formally that these methods enjoy comparable guarantees on performance for learning applications while being substantially more efficient than existing alternatives.
arXiv Detail & Related papers (2022-09-20T17:25:14Z) - Hierarchical Nearest Neighbor Graph Embedding for Efficient
Dimensionality Reduction [25.67957712837716]
We introduce a novel method based on a hierarchy built on 1-nearest neighbor graphs in the original space.
The proposal is an optimization-free projection that is competitive with the latest versions of t-SNE and UMAP.
In the paper, we argue about the soundness of the proposed method and evaluate it on a diverse collection of datasets with sizes varying from 1K to 11M samples and dimensions from 28 to 16K.
arXiv Detail & Related papers (2022-03-24T11:41:16Z) - Scalable semi-supervised dimensionality reduction with GPU-accelerated
EmbedSOM [0.0]
BlosSOM is a high-performance semi-supervised dimensionality reduction software for interactive user-steerable visualization of high-dimensional datasets.
We show the application of BlosSOM on realistic datasets, where it helps to produce high-quality visualizations that incorporate user-specified layout and focus on certain features.
arXiv Detail & Related papers (2022-01-03T15:06:22Z) - Revisiting Point Cloud Simplification: A Learnable Feature Preserving
Approach [57.67932970472768]
Mesh and Point Cloud simplification methods aim to reduce the complexity of 3D models while retaining visual quality and relevant salient features.
We propose a fast point cloud simplification method by learning to sample salient points.
The proposed method relies on a graph neural network architecture trained to select an arbitrary, user-defined, number of points from the input space and to re-arrange their positions so as to minimize the visual perception error.
arXiv Detail & Related papers (2021-09-30T10:23:55Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Supervised Visualization for Data Exploration [9.742277703732187]
We describe a novel supervised visualization technique based on random forest proximities and diffusion-based dimensionality reduction.
Our approach is robust to noise and parameter tuning, thus making it simple to use while producing reliable visualizations for data exploration.
arXiv Detail & Related papers (2020-06-15T19:10:17Z) - NCVis: Noise Contrastive Approach for Scalable Visualization [79.44177623781043]
NCVis is a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation.
We show that NCVis outperforms state-of-the-art techniques in terms of speed while preserving the representation quality of other methods.
arXiv Detail & Related papers (2020-01-30T15:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.