On Probabilistic Embeddings in Optimal Dimension Reduction
- URL: http://arxiv.org/abs/2408.02433v1
- Date: Mon, 5 Aug 2024 12:46:21 GMT
- Title: On Probabilistic Embeddings in Optimal Dimension Reduction
- Authors: Ryan Murray, Adam Pickarski,
- Abstract summary: Dimension reduction algorithms are a crucial part of many data science pipelines.
Despite their wide utilization, many non-linear dimension reduction algorithms are poorly understood from a theoretical perspective.
- Score: 1.2085509610251701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dimension reduction algorithms are a crucial part of many data science pipelines, including data exploration, feature creation and selection, and denoising. Despite their wide utilization, many non-linear dimension reduction algorithms are poorly understood from a theoretical perspective. In this work we consider a generalized version of multidimensional scaling, which is posed as an optimization problem in which a mapping from a high-dimensional feature space to a lower-dimensional embedding space seeks to preserve either inner products or norms of the distribution in feature space, and which encompasses many commonly used dimension reduction algorithms. We analytically investigate the variational properties of this problem, leading to the following insights: 1) Solutions found using standard particle descent methods may lead to non-deterministic embeddings, 2) A relaxed or probabilistic formulation of the problem admits solutions with easily interpretable necessary conditions, 3) The globally optimal solutions to the relaxed problem actually must give a deterministic embedding. This progression of results mirrors the classical development of optimal transportation, and in a case relating to the Gromov-Wasserstein distance actually gives explicit insight into the structure of the optimal embeddings, which are parametrically determined and discontinuous. Finally, we illustrate that a standard computational implementation of this task does not learn deterministic embeddings, which means that it learns sub-optimal mappings, and that the embeddings learned in that context have highly misleading clustering structure, underscoring the delicate nature of solving this problem computationally.
Related papers
- Gauge-optimal approximate learning for small data classification
problems [0.0]
Small data learning problems are characterized by a discrepancy between the limited amount of response variable observations and the large feature space dimension.
We propose the Gauge- Optimal Approximate Learning (GOAL) algorithm, which provides an analytically tractable joint solution to the reduction dimension, feature segmentation and classification problems.
GOAL has been compared to other state-of-the-art machine learning (ML) tools on both synthetic data and challenging real-world applications from climate science and bioinformatics.
arXiv Detail & Related papers (2023-10-29T16:46:05Z) - Prescriptive PCA: Dimensionality Reduction for Two-stage Stochastic
Optimization [1.1612308609123565]
We develop a prescriptive dimensionality reduction framework that aims to minimize the degree of suboptimality in the optimization phase.
For the case where the downstream optimization problem has an expected value objective, we show that prescriptive dimensionality reduction can be performed via solving a distributionally-robust optimization problem.
Our approach significantly outperforms principal component analysis with real and synthetic data sets.
arXiv Detail & Related papers (2023-06-04T00:50:35Z) - Interpretable Linear Dimensionality Reduction based on Bias-Variance
Analysis [45.3190496371625]
We propose a principled dimensionality reduction approach that maintains the interpretability of the resulting features.
In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved.
arXiv Detail & Related papers (2023-03-26T14:30:38Z) - Multistage Stochastic Optimization via Kernels [3.7565501074323224]
We develop a non-parametric, data-driven, tractable approach for solving multistage optimization problems.
We show that the proposed method produces decision rules with near-optimal average performance.
arXiv Detail & Related papers (2023-03-11T23:19:32Z) - Adaptive Stochastic Optimisation of Nonconvex Composite Objectives [2.1700203922407493]
We propose and analyse a family of generalised composite mirror descent algorithms.
With adaptive step sizes, the proposed algorithms converge without requiring prior knowledge of the problem.
We exploit the low-dimensional structure of the decision sets for high-dimensional problems.
arXiv Detail & Related papers (2022-11-21T18:31:43Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector
Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously.
We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.