Principal component analysis balancing prediction and approximation accuracy for spatial data
- URL: http://arxiv.org/abs/2408.01662v2
- Date: Mon, 9 Sep 2024 01:28:51 GMT
- Title: Principal component analysis balancing prediction and approximation accuracy for spatial data
- Authors: Si Cheng, Magali N. Blanco, Timothy V. Larson, Lianne Sheppard, Adam Szpiro, Ali Shojaie,
- Abstract summary: We formalize the closeness of approximation to the original data and the utility of lower-dimensional scores for downstream modeling.
We propose a flexible dimension reduction algorithm that achieves the optimal trade-off.
- Score: 2.4849437811455797
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dimension reduction is often the first step in statistical modeling or prediction of multivariate spatial data. However, most existing dimension reduction techniques do not account for the spatial correlation between observations and do not take the downstream modeling task into consideration when finding the lower-dimensional representation. We formalize the closeness of approximation to the original data and the utility of lower-dimensional scores for downstream modeling as two complementary, sometimes conflicting, metrics for dimension reduction. We illustrate how existing methodologies fall into this framework and propose a flexible dimension reduction algorithm that achieves the optimal trade-off. We derive a computationally simple form for our algorithm and illustrate its performance through simulation studies, as well as two applications in air pollution modeling and spatial transcriptomics.
Related papers
- Probabilistic Reduced-Dimensional Vector Autoregressive Modeling with
Oblique Projections [0.7614628596146602]
We propose a reduced-dimensional vector autoregressive model to extract low-dimensional dynamics from noisy data.
An optimal oblique decomposition is derived for the best predictability regarding prediction error covariance.
The superior performance and efficiency of the proposed approach are demonstrated using data sets from a synthesized Lorenz system and an industrial process from Eastman Chemical.
arXiv Detail & Related papers (2024-01-14T05:38:10Z) - Symplectic model reduction of Hamiltonian systems using data-driven
quadratic manifolds [0.559239450391449]
We present two novel approaches for the symplectic model reduction of high-dimensional Hamiltonian systems.
The addition of quadratic terms to the state approximation, which sits at the heart of the proposed methodologies, enables us to better represent intrinsic low-dimensionality.
arXiv Detail & Related papers (2023-05-24T18:23:25Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - Extension of Dynamic Mode Decomposition for dynamic systems with
incomplete information based on t-model of optimal prediction [69.81996031777717]
The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data.
The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured.
We consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method.
arXiv Detail & Related papers (2022-02-23T11:23:59Z) - A Model for Multi-View Residual Covariances based on Perspective
Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups.
We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z) - Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal
Autoregressions with an Application to NO2 Satellite Data [0.0]
We consider sparse estimation of a class of high-dimensional models.
We estimate the relationships governing both the spatial and temporal dependence in a fully-driven way by penalizing a set of Yule-Walker equations.
A satellite simulation exercise shows strong finite sample performance compared to competing procedures.
arXiv Detail & Related papers (2021-08-05T21:51:45Z) - Manifold learning-based polynomial chaos expansions for high-dimensional
surrogate models [0.0]
We introduce a manifold learning-based method for uncertainty quantification (UQ) in describing systems.
The proposed method is able to achieve highly accurate approximations which ultimately lead to the significant acceleration of UQ tasks.
arXiv Detail & Related papers (2021-07-21T00:24:15Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.