Dataset-Adaptive Dimensionality Reduction
- URL: http://arxiv.org/abs/2507.11984v1
- Date: Wed, 16 Jul 2025 07:32:08 GMT
- Title: Dataset-Adaptive Dimensionality Reduction
- Authors: Hyeon Jeon, Jeongin Park, Soohyun Lee, Dae Hyun Kim, Sungbok Shin, Jinwook Seo,
- Abstract summary: We propose a dataset-adaptive approach to dimensionality reduction (DR) optimization guided by structural complexity metrics.<n>These metrics quantify the intrinsic complexity of a dataset, predicting whether higher-dimensional spaces are necessary to represent it accurately.<n>We empirically show that our dataset-adaptive workflow significantly enhances the efficiency of DR optimization without compromising accuracy.
- Score: 11.180683480772373
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Selecting the appropriate dimensionality reduction (DR) technique and determining its optimal hyperparameter settings that maximize the accuracy of the output projections typically involves extensive trial and error, often resulting in unnecessary computational overhead. To address this challenge, we propose a dataset-adaptive approach to DR optimization guided by structural complexity metrics. These metrics quantify the intrinsic complexity of a dataset, predicting whether higher-dimensional spaces are necessary to represent it accurately. Since complex datasets are often inaccurately represented in two-dimensional projections, leveraging these metrics enables us to predict the maximum achievable accuracy of DR techniques for a given dataset, eliminating redundant trials in optimizing DR. We introduce the design and theoretical foundations of these structural complexity metrics. We quantitatively verify that our metrics effectively approximate the ground truth complexity of datasets and confirm their suitability for guiding dataset-adaptive DR workflow. Finally, we empirically show that our dataset-adaptive workflow significantly enhances the efficiency of DR optimization without compromising accuracy.
Related papers
- MOE-Enhanced Explanable Deep Manifold Transformation for Complex Data Embedding and Visualization [47.4136073281818]
Dimensionality reduction (DR) plays a crucial role in various fields, including data engineering and visualization.<n>DR methods face a trade-off between precision and transparency, where optimizing for performance can lead to reduced explainability.<n>This work introduces the MOE-based Explainable Deep Manifold Transformation (DMT-ME)
arXiv Detail & Related papers (2024-10-25T12:11:32Z) - Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - A Bayesian Gaussian Process-Based Latent Discriminative Generative Decoder (LDGD) Model for High-Dimensional Data [0.41942958779358674]
latent discriminative generative decoder (LDGD) employs both the data and associated labels in the manifold discovery process.
We show that LDGD can robustly infer manifold and precisely predict labels for scenarios in that data size is limited.
arXiv Detail & Related papers (2024-01-29T19:11:03Z) - Integer Optimization of CT Trajectories using a Discrete Data
Completeness Formulation [3.924235219960689]
X-ray computed tomography plays a key role in digitizing three-dimensional structures for a wide range of medical and industrial applications.
Traditional CT systems often rely on standard circular and helical scan trajectories, which may not be optimal for challenging scenarios involving large objects, complex structures, or resource constraints.
We are exploring the potential of twin robotic CT systems, which offer the flexibility to acquire projections from arbitrary views around the object of interest.
arXiv Detail & Related papers (2024-01-29T10:38:58Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - Optimizer's Information Criterion: Dissecting and Correcting Bias in Data-Driven Optimization [16.57676001669012]
In data-driven optimization, the sample performance of the obtained decision typically incurs an optimistic bias against the true performance.
Common techniques to correct this bias, such as cross-validation, require repeatedly solving additional optimization problems and are therefore expensive.
We develop a general bias correction approach that directly approximates the first-order bias and does not require solving any additional optimization problems.
arXiv Detail & Related papers (2023-06-16T07:07:58Z) - Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning [2.4578723416255754]
Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty.<n>We propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius.<n>We show that our method achieves the same effect as cross-validation, but at a fraction of the computational cost.
arXiv Detail & Related papers (2022-06-27T13:02:59Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Approximate Dynamics Lead to More Optimal Control: Efficient Exact
Derivatives [0.0]
We show here that the computational feasibility of meeting this accuracy requirement depends on the choice of propagation scheme and problem representation.
This methodology allows numerically efficient optimization of very high-dimensional dynamics.
arXiv Detail & Related papers (2020-05-20T10:02:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.