SO(3)-invariant PCA with application to molecular data
- URL: http://arxiv.org/abs/2510.18827v1
- Date: Tue, 21 Oct 2025 17:23:17 GMT
- Title: SO(3)-invariant PCA with application to molecular data
- Authors: Michael Fraiman, Paulina Hoyos, Tamir Bendory, Joe Kileel, Oscar Mickelin, Nir Sharon, Amit Singer,
- Abstract summary: A naive approach requires augmenting the dataset with many rotated copies of each sample, incurring prohibitive computational costs.<n>We develop an efficient and principled framework for SO(3)-invariant PCA that implicitly accounts for all rotations without explicit data augmentation.<n>We validate the method on real-world molecular datasets, demonstrating its effectiveness and opening up new possibilities for large-scale, high-dimensional reconstruction problems.
- Score: 6.809315864458212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Principal component analysis (PCA) is a fundamental technique for dimensionality reduction and denoising; however, its application to three-dimensional data with arbitrary orientations -- common in structural biology -- presents significant challenges. A naive approach requires augmenting the dataset with many rotated copies of each sample, incurring prohibitive computational costs. In this paper, we extend PCA to 3D volumetric datasets with unknown orientations by developing an efficient and principled framework for SO(3)-invariant PCA that implicitly accounts for all rotations without explicit data augmentation. By exploiting underlying algebraic structure, we demonstrate that the computation involves only the square root of the total number of covariance entries, resulting in a substantial reduction in complexity. We validate the method on real-world molecular datasets, demonstrating its effectiveness and opening up new possibilities for large-scale, high-dimensional reconstruction problems.
Related papers
- Multi-Dimensional Visual Data Recovery: Scale-Aware Tensor Modeling and Accelerated Randomized Computation [51.65236537605077]
We propose a new type of network compression optimization technique, fully randomized tensor network compression (FCTN)<n>FCTN has significant advantages in correlation characterization and transpositional in algebra, and has notable achievements in multi-dimensional data processing and analysis.<n>We derive efficient algorithms with guarantees to solve the formulated models.
arXiv Detail & Related papers (2026-02-13T14:56:37Z) - Rotational Sampling: A Plug-and-Play Encoder for Rotation-Invariant 3D Molecular GNNs [5.558678875187018]
Graph neural networks (GNNs) have achieved remarkable success in molecular property prediction.<n>Traditional graph representations struggle to effectively encode the inherent 3D spatial structures of molecules.<n>This paper proposes a novel plug-and-play 3D encoding module leveraging rotational sampling.
arXiv Detail & Related papers (2025-07-01T08:58:12Z) - Solve sparse PCA problem by employing Hamiltonian system and leapfrog method [0.0]
We propose a novel sparse PCA algorithm that imposes sparsity through a smooth L1 penalty.<n> Experimental evaluations on a face recognition dataset-using both k-nearest neighbor and kernel ridge regressions-demonstrate that the proposed sparse PCA methods consistently achieve higher classification accuracy than conventional PCA.
arXiv Detail & Related papers (2025-03-30T06:39:11Z) - Adversarial Dependence Minimization [78.36795688238155]
This work provides a differentiable and scalable algorithm for dependence minimization that goes beyond linear pairwise decorrelation.<n>We demonstrate its utility in three applications: extending PCA to nonlinear decorrelation, improving the generalization of image classification methods, and preventing dimensional collapse in self-supervised representation learning.
arXiv Detail & Related papers (2025-02-05T14:43:40Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.<n>In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.<n>This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Local manifold learning and its link to domain-based physics knowledge [53.15471241298841]
In many reacting flow systems, the thermo-chemical state-space is assumed to evolve close to a low-dimensional manifold (LDM)
We show that PCA applied in local clusters of data (local PCA) is capable of detecting the intrinsic parameterization of the thermo-chemical state-space.
arXiv Detail & Related papers (2022-07-01T09:06:25Z) - PCA-Boosted Autoencoders for Nonlinear Dimensionality Reduction in Low
Data Regimes [0.2925461470287228]
We propose a technique that harnesses the best of both worlds: an autoencoder that leverages PCA to perform well on scarce nonlinear data.
A synthetic example is presented first to study the effects of data nonlinearity and size on the performance of the proposed method.
arXiv Detail & Related papers (2022-05-23T23:46:52Z) - Relative Pose from SIFT Features [50.81749304115036]
We derive a new linear constraint relating the unknown elements of the fundamental matrix and the orientation and scale.
The proposed constraint is tested on a number of problems in a synthetic environment and on publicly available real-world datasets on more than 80000 image pairs.
arXiv Detail & Related papers (2022-03-15T14:16:39Z) - A Linearly Convergent Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
This paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA)
The proposed algorithm is shown to converge linearly to a neighborhood of the true solution.
arXiv Detail & Related papers (2021-01-05T00:51:14Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension
reduction & clustering [9.042239247913642]
This article focuses on improving upon PCA and k-means, by allowing nonlinear relations in the data and more flexible cluster shapes.
The key contribution is a new framework for Principal Analysis (PEA), defining a simple and computationally efficient alternative to PCA.
In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.
arXiv Detail & Related papers (2020-08-17T06:25:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.