Derivation of Coupled PCA and SVD Learning Rules from a Newton
Zero-Finding Framework
- URL: http://arxiv.org/abs/2003.11456v1
- Date: Wed, 25 Mar 2020 15:49:55 GMT
- Title: Derivation of Coupled PCA and SVD Learning Rules from a Newton
Zero-Finding Framework
- Authors: Ralf M\"oller
- Abstract summary: A method to derive coupled learning rules from information criteria by Newton optimization is known.
Here we describe an alternative approach where coupled PCA and SVD learning rules can systematically be derived from a Newton zero-finding framework.
To demonstrate the framework, we derive PCA and SVD learning rules with constant Euclidean length or constant sum of the vector estimates.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In coupled learning rules for PCA (principal component analysis) and SVD
(singular value decomposition), the update of the estimates of eigenvectors or
singular vectors is influenced by the estimates of eigenvalues or singular
values, respectively. This coupled update mitigates the speed-stability problem
since the update equations converge from all directions with approximately the
same speed. A method to derive coupled learning rules from information criteria
by Newton optimization is known. However, these information criteria have to be
designed, offer no explanatory value, and can only impose Euclidean constraints
on the vector estimates. Here we describe an alternative approach where coupled
PCA and SVD learning rules can systematically be derived from a Newton
zero-finding framework. The derivation starts from an objective function,
combines the equations for its extrema with arbitrary constraints on the vector
estimates, and solves the resulting vector zero-point equation using Newton's
zero-finding method. To demonstrate the framework, we derive PCA and SVD
learning rules with constant Euclidean length or constant sum of the vector
estimates.
Related papers
- Refined Risk Bounds for Unbounded Losses via Transductive Priors [58.967816314671296]
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression.
Our key tools are based on the exponential weights algorithm with carefully chosen transductive priors.
arXiv Detail & Related papers (2024-10-29T00:01:04Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning [101.81127587760831]
Current fine-tuning methods build adapters widely of the context of downstream task to learn, or the context of important knowledge to maintain.
We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable task-aware adapters.
Our method enables two options, the knowledge-preserved adaptation and the instruction-previewed adaptation.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles [10.40809014729148]
Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models.
We show how a new objective function measures the angle between the risk gradient vector and the projection of the condition output vector onto the complement of the already selected conditions.
This approach correctly approximates the ideal update of adding the risk gradient itself to the model and favours the inclusion of more general and thus shorter rules.
arXiv Detail & Related papers (2024-02-24T02:29:10Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z) - Context-Aware Ensemble Learning for Time Series [11.716677452529114]
We introduce a new approach using a meta learner that effectively combines the base model predictions via using a superset of the features that is the union of the base models' feature vectors instead of the predictions themselves.
Our model does not use the predictions of the base models as inputs to a machine learning algorithm, but choose the best possible combination at each time step based on the state of the problem.
arXiv Detail & Related papers (2022-11-30T10:36:13Z) - Derivation of Learning Rules for Coupled Principal Component Analysis in
a Lagrange-Newton Framework [0.0]
We describe a Lagrange-Newton framework for the derivation of learning rules with desirable convergence properties.
A Newton descent is applied to an extended variable vector which also includes Lagrange multipliers introduced with constraints.
The framework produces "coupled" PCA learning rules which simultaneously estimate an eigenvector and the corresponding eigenvalue in cross-coupled differential equations.
arXiv Detail & Related papers (2022-04-28T12:50:11Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Accurate and fast matrix factorization for low-rank learning [4.435094091999926]
We tackle two important challenges related to the accurate partial singular value decomposition (SVD) and numerical rank estimation of a huge matrix.
We use the concepts of Krylov subspaces such as the Golub-Kahan bidiagonalization process as well as Ritz vectors to achieve these goals.
arXiv Detail & Related papers (2021-04-21T22:35:02Z) - Derivation of Symmetric PCA Learning Rules from a Novel Objective
Function [0.0]
Neural learning rules for principal component / subspace analysis can be derived by maximizing an objective function.
For a subspace with a single axis, the optimization produces the principal eigenvector of the data covariance matrix.
For a subspace with multiple axes, the optimization leads to PSA learning rules which only converge to axes spanning the principal subspace but not to the principal eigenvectors.
arXiv Detail & Related papers (2020-05-24T08:57:54Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.