A Novel Approach for Intrinsic Dimension Estimation
- URL: http://arxiv.org/abs/2503.09485v1
- Date: Wed, 12 Mar 2025 15:42:39 GMT
- Title: A Novel Approach for Intrinsic Dimension Estimation
- Authors: Kadir Özçoban, Murat Manguoğlu, Emrullah Fatih Yetkin,
- Abstract summary: The real-life data have a complex and non-linear structure due to their nature.<n>Finding the nearly optimal representation of the dataset in a lower-dimensional space offers an applicable mechanism for improving the success of machine learning tasks.<n>We propose a highly efficient and robust intrinsic dimension estimation approach.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality. Finding the nearly optimal representation of the dataset in a lower-dimensional space (i.e. dimensionality reduction) offers an applicable mechanism for improving the success of machine learning tasks. However, estimating the required data dimension for the nearly optimal representation (intrinsic dimension) can be very costly, particularly if one deals with big data. We propose a highly efficient and robust intrinsic dimension estimation approach that only relies on matrix-vector products for dimensionality reduction methods. An experimental study is also conducted to compare the performance of proposed method with state of the art approaches.
Related papers
- An Incremental Non-Linear Manifold Approximation Method [0.0]
This research develops an incremental non-linear dimension reduction method using the Geometric Multi-Resolution Analysis (GMRA) framework for streaming data.
The proposed method enables real-time data analysis and visualization by incrementally updating the cluster map, basis PCA vectors, and wavelet coefficients.
arXiv Detail & Related papers (2025-04-12T03:54:05Z) - A dimensionality reduction technique based on the Gromov-Wasserstein distance [7.8772082926712415]
We propose a new method for dimensionality reduction based on optimal transportation theory and the Gromov-Wasserstein distance.<n>Our method embeds high-dimensional data into a lower-dimensional space, providing a robust and efficient solution for analyzing complex high-dimensional datasets.
arXiv Detail & Related papers (2025-01-23T15:05:51Z) - Interpretable Linear Dimensionality Reduction based on Bias-Variance
Analysis [45.3190496371625]
We propose a principled dimensionality reduction approach that maintains the interpretability of the resulting features.
In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved.
arXiv Detail & Related papers (2023-03-26T14:30:38Z) - An evaluation framework for dimensionality reduction through sectional
curvature [59.40521061783166]
In this work, we aim to introduce the first highly non-supervised dimensionality reduction performance metric.
To test its feasibility, this metric has been used to evaluate the performance of the most commonly used dimension reduction algorithms.
A new parameterized problem instance generator has been constructed in the form of a function generator.
arXiv Detail & Related papers (2023-03-17T11:59:33Z) - Linear Embedding-based High-dimensional Batch Bayesian Optimization
without Reconstruction Mappings [21.391136086094225]
We show that our method is applicable to batch optimization problems with thousands of dimensions without any computational difficulty.
We demonstrate the effectiveness of our method on high-dimensional benchmarks and a real-world function.
arXiv Detail & Related papers (2022-11-02T08:11:10Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.