Related papers: ONG: Orthogonal Natural Gradient Descent

ONG: Orthogonal Natural Gradient Descent

URL: http://arxiv.org/abs/2508.17169v2
Date: Sun, 31 Aug 2025 04:34:08 GMT
Title: ONG: Orthogonal Natural Gradient Descent
Authors: Yajat Yadav, Patrick Mendoza, Jathin Korrapati,
Abstract summary: We introduce the Orthogonal Natural Gradient Descent (ONG) algorithm.<n>ONG preconditions each new task-specific gradient with an efficient EKFAC approximation of the inverse Fisher information matrix.<n>To preserve performance on previously learned tasks, ONG projects these natural gradients onto the complement of prior tasks' gradients.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Orthogonal Gradient Descent (OGD) has emerged as a powerful method for continual learning. However, its Euclidean projections do not leverage the underlying information-geometric structure of the problem, which can lead to suboptimal convergence in learning tasks. To address this, we propose incorporating the natural gradient into OGD and present \textbf{ONG (Orthogonal Natural Gradient Descent)}. ONG preconditions each new task-specific gradient with an efficient EKFAC approximation of the inverse Fisher information matrix, yielding updates that follow the steepest descent direction under a Riemannian metric. To preserve performance on previously learned tasks, ONG projects these natural gradients onto the orthogonal complement of prior tasks' gradients. We provide an initial theoretical justification for this procedure, introduce the Orthogonal Natural Gradient Descent (ONG) algorithm, and present preliminary results on the Permuted and Rotated MNIST benchmarks. Our preliminary results, however, indicate that a naive combination of natural gradients and orthogonal projections can have potential issues. This finding motivates continued future work focused on robustly reconciling these geometric perspectives to develop a continual learning method, establishing a more rigorous theoretical foundation with formal convergence guarantees, and extending empirical validation to large-scale continual learning benchmarks. The anonymized version of our code can be found as the zip file here: https://drive.google.com/drive/folders/11PyU6M8pNgOUB5pwdGORtbnMtD8Shiw_?usp=sharing.

Related papers

On Multi-Step Theorem Prediction via Non-Parametric Structural Priors [50.16583672681106]
In this work, we explore training-free theorem prediction through the lens of in-context learning (ICL)<n>We propose Theorem Precedence Graphs, which encode temporal dependencies from historical solution traces as directed graphs, and impose explicit topological constraints that effectively prune the search space during inference.<n>Experiments on the FormalGeo7k benchmark show that our method achieves 89.29% accuracy, substantially outperforming ICL baselines and matching state-of-the-art supervised models.
arXiv Detail & Related papers (2026-03-05T06:08:50Z)
Fisher-Orthogonal Projected Natural Gradient Descent for Continual Learning [0.6999740786886536]
We propose the Fisher-Orthogonal Projected Natural Gradient Descent (FOPNG)<n>FOPNG enforces Fisher-orthogonal constraints on parameter updates to preserve old task performance while learning new tasks.<n>We provide theoretical analysis deriving the projected update, describe efficient and practical implementations using the diagonal Fisher.
arXiv Detail & Related papers (2026-01-19T08:23:12Z)
Stochastic Orthogonal Regularization for deep projective priors [2.990411348977783]
In this paper, we focus on generalized projected descent gradient (GPGD) algorithms.<n> neural networks allow for projections onto unknown low-dimensional sets that model complex data, such as images.
arXiv Detail & Related papers (2025-05-19T13:12:01Z)
Visual Prompt Tuning in Null Space for Continual Learning [51.96411454304625]
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL) This paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient projection.
arXiv Detail & Related papers (2024-06-09T05:57:40Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method [31.891933360081342]
We prove that DoWG is optimally efficient -- matching the convergence rate of tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal. DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To complement our theory, we show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks.
arXiv Detail & Related papers (2023-05-25T17:40:43Z)
Achieving High Accuracy with PINNs via Energy Natural Gradients [0.0]
We show that the update direction in function space resulting from the energy natural gradient corresponds to the Newton direction modulo an projection onto the model's tangent space. We demonstrate experimentally that energy natural gradient descent yields highly accurate solutions with errors several orders of magnitude smaller than what is obtained when training PINNs with standard gradient descent or Adam.
arXiv Detail & Related papers (2023-02-25T21:17:19Z)
RawlsGCN: Towards Rawlsian Difference Principle on Graph Convolutional Network [102.27090022283208]
Graph Convolutional Network (GCN) plays pivotal roles in many real-world applications. GCN often exhibits performance disparity with respect to node degrees, resulting in worse predictive accuracy for low-degree nodes. We formulate the problem of mitigating the degree-related performance disparity in GCN from the perspective of the Rawlsian difference principle.
arXiv Detail & Related papers (2022-02-28T05:07:57Z)
Natural continual learning: success is a journey, not (just) a destination [9.462808515258464]
Natural Continual Learning (NCL) is a new method that unifies weight regularization and projected gradient descent. Our method outperforms both standard weight regularization techniques and projection based approaches when applied to continual learning problems in RNNs. The trained networks evolve task-specific dynamics that are strongly preserved as new tasks are learned, similar to experimental findings in biological circuits.
arXiv Detail & Related papers (2021-06-15T12:24:53Z)
Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL) New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z)
Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI) Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z)
Sinkhorn Natural Gradient for Generative Models [125.89871274202439]
We propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence. We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically. In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
arXiv Detail & Related papers (2020-11-09T02:51:17Z)
Two-Level K-FAC Preconditioning for Deep Learning [7.699428789159717]
In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Gradient Descent. In particular, starting with Adagrad, a seemingly endless line of research advocates the use of diagonal approximations of the so-called empirical Fisher matrix. One particularly successful variant of such methods is the so-called K-FAC, which uses a Kronecker-ed block-factored preconditioner.
arXiv Detail & Related papers (2020-11-01T17:54:21Z)
Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent [81.29979864862081]
In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting. We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime.
arXiv Detail & Related papers (2020-06-21T23:49:57Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.