A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks
- URL: http://arxiv.org/abs/2103.06383v1
- Date: Wed, 10 Mar 2021 23:10:47 GMT
- Title: A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks
- Authors: Xiang Wang, Xiaoyong Li, Junxing Zhu, Zichen Xu, Kaijun Ren, Weiming
Zhang, Xinwang Liu, Kui Yu
- Abstract summary: We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
- Score: 56.068488417457935
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Real-world data usually have high dimensionality and it is important to
mitigate the curse of dimensionality. High-dimensional data are usually in a
coherent structure and make the data in relatively small true degrees of
freedom. There are global and local dimensionality reduction methods to
alleviate the problem. Most of existing methods for local dimensionality
reduction obtain an embedding with the eigenvalue or singular value
decomposition, where the computational complexities are very high for a large
amount of data. Here we propose a novel local nonlinear approach named Vec2vec
for general purpose dimensionality reduction, which generalizes recent
advancements in embedding representation learning of words to dimensionality
reduction of matrices. It obtains the nonlinear embedding using a neural
network with only one hidden layer to reduce the computational complexity. To
train the neural network, we build the neighborhood similarity graph of a
matrix and define the context of data points by exploiting the random walk
properties. Experiments demenstrate that Vec2vec is more efficient than several
state-of-the-art local dimensionality reduction methods in a large number of
high-dimensional data. Extensive experiments of data classification and
clustering on eight real datasets show that Vec2vec is better than several
classical dimensionality reduction methods in the statistical hypothesis test,
and it is competitive with recently developed state-of-the-art UMAP.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Relative intrinsic dimensionality is intrinsic to learning [49.5738281105287]
We introduce a new notion of the intrinsic dimension of a data distribution, which precisely captures the separability properties of the data.
For this intrinsic dimension, the rule of thumb above becomes a law: high intrinsic dimension guarantees highly separable data.
We show thisRelative intrinsic dimension provides both upper and lower bounds on the probability of successfully learning and generalising in a binary classification problem.
arXiv Detail & Related papers (2023-10-10T10:41:45Z) - Linearly-scalable learning of smooth low-dimensional patterns with
permutation-aided entropic dimension reduction [0.0]
In many data science applications, the objective is to extract appropriately-ordered smooth low-dimensional data patterns from high-dimensional data sets.
We show that when selecting the Euclidean smoothness as a pattern quality criterium, both of these problems can be efficiently solved numerically.
arXiv Detail & Related papers (2023-06-17T08:03:24Z) - Besov Function Approximation and Binary Classification on
Low-Dimensional Manifolds Using Convolutional Residual Networks [42.43493635899849]
We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification.
Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.
arXiv Detail & Related papers (2021-09-07T02:58:11Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Modern Dimension Reduction [0.0]
This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code.
I introduce readers through application of the following techniques: locally linear embedding, t-distributed neighbor embedding, uniform manifold approximation and projection, self-organizing maps, and deep autoencoders.
arXiv Detail & Related papers (2021-03-11T14:54:33Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Kernel Two-Dimensional Ridge Regression for Subspace Clustering [45.651770340521786]
We propose a novel subspace clustering method for 2D data.
It directly uses 2D data as inputs such that the learning of representations benefits from inherent structures and relationships of the data.
arXiv Detail & Related papers (2020-11-03T04:52:46Z) - Learning a Deep Part-based Representation by Preserving Data
Distribution [21.13421736154956]
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems.
In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding.
The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI.
arXiv Detail & Related papers (2020-09-17T12:49:36Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.