Dimensionality Reduction via Diffusion Map Improved with Supervised
Linear Projection
- URL: http://arxiv.org/abs/2008.03440v1
- Date: Sat, 8 Aug 2020 04:26:07 GMT
- Title: Dimensionality Reduction via Diffusion Map Improved with Supervised
Linear Projection
- Authors: Bowen Jiang, Maohao Shen
- Abstract summary: In this paper, we assume the data samples lie on a single underlying smooth manifold.
We define intra-class and inter-class similarities using pairwise local kernel distances.
We aim to find a linear projection to maximize the intra-class similarities and minimize the inter-class similarities simultaneously.
- Score: 1.7513645771137178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When performing classification tasks, raw high dimensional features often
contain redundant information, and lead to increased computational complexity
and overfitting. In this paper, we assume the data samples lie on a single
underlying smooth manifold, and define intra-class and inter-class similarities
using pairwise local kernel distances. We aim to find a linear projection to
maximize the intra-class similarities and minimize the inter-class similarities
simultaneously, so that the projected low dimensional data has optimized
pairwise distances based on the label information, which is more suitable for a
Diffusion Map to do further dimensionality reduction. Numerical experiments on
several benchmark datasets show that our proposed approaches are able to
extract low dimensional discriminate features that could help us achieve higher
classification accuracy.
Related papers
- Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.
In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.
This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Optimal Discriminant Analysis in High-Dimensional Latent Factor Models [1.4213973379473654]
In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space.
We formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure.
We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections.
arXiv Detail & Related papers (2022-10-23T21:45:53Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Learning Low-Dimensional Nonlinear Structures from High-Dimensional
Noisy Data: An Integral Operator Approach [5.975670441166475]
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations.
The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold.
The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction.
arXiv Detail & Related papers (2022-02-28T22:46:34Z) - Topology-Preserving Dimensionality Reduction via Interleaving
Optimization [10.097180927318703]
We show how optimization seeking to minimize the interleaving distance can be incorporated into dimensionality reduction algorithms.
We demonstrate the utility of this framework to data visualization.
arXiv Detail & Related papers (2022-01-31T06:11:17Z) - Featurized Density Ratio Estimation [82.40706152910292]
In our work, we propose to leverage an invertible generative model to map the two distributions into a common feature space prior to estimation.
This featurization brings the densities closer together in latent space, sidestepping pathological scenarios where the learned density ratios in input space can be arbitrarily inaccurate.
At the same time, the invertibility of our feature map guarantees that the ratios computed in feature space are equivalent to those in input space.
arXiv Detail & Related papers (2021-07-05T18:30:26Z) - Manifold Partition Discriminant Analysis [42.11470531267327]
We propose a novel algorithm for supervised dimensionality reduction named Manifold Partition Discriminant Analysis (MPDA)
It aims to find a linear embedding space where the within-class similarity is achieved along the direction that is consistent with the local variation of the data manifold.
MPDA explicitly parameterizes the connections of tangent spaces and represents the data manifold in a piecewise manner.
arXiv Detail & Related papers (2020-11-23T16:33:23Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.