Neural Networks with Orthogonal Jacobian
- URL: http://arxiv.org/abs/2508.02882v1
- Date: Mon, 04 Aug 2025 20:24:49 GMT
- Title: Neural Networks with Orthogonal Jacobian
- Authors: Alex Massucco, Davide Murari, Carola-Bibiane Schönlieb,
- Abstract summary: Very deep neural networks achieve state-of-the-art performance by extracting rich, hierarchical features.<n>Training them via backpropagation is often hindered by vanishing or exploding gradients.<n>We introduce a unified mathematical framework that describes a broad class of nonlinear feedforward and residual networks.
- Score: 10.111901389604423
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Very deep neural networks achieve state-of-the-art performance by extracting rich, hierarchical features. Yet, training them via backpropagation is often hindered by vanishing or exploding gradients. Existing remedies, such as orthogonal or variance-preserving initialisation and residual architectures, allow for a more stable gradient propagation and the training of deeper models. In this work, we introduce a unified mathematical framework that describes a broad class of nonlinear feedforward and residual networks, whose input-to-output Jacobian matrices are exactly orthogonal almost everywhere. Such a constraint forces the resulting networks to achieve perfect dynamical isometry and train efficiently despite being very deep. Our formulation not only recovers standard architectures as particular cases but also yields new designs that match the trainability of residual networks without relying on conventional skip connections. We provide experimental evidence that perfect Jacobian orthogonality at initialisation is sufficient to stabilise training and achieve competitive performance. We compare this strategy to networks regularised to maintain the Jacobian orthogonality and obtain comparable results. We further extend our analysis to a class of networks well-approximated by those with orthogonal Jacobians and introduce networks with Jacobians representing partial isometries. These generalized models are then showed to maintain the favourable trainability properties.
Related papers
- GrokAlign: Geometric Characterisation and Acceleration of Grokking [14.286905882211896]
We show that aligning a network's Jacobians with the training data ensures grokking under a low-rank Jacobian assumption.<n>Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks.<n>We introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment.
arXiv Detail & Related papers (2025-06-14T00:16:11Z) - Hierarchical Feature-level Reverse Propagation for Post-Training Neural Networks [24.442592456755698]
End-to-end autonomous driving has emerged as a dominant paradigm, yet its highly entangled black-box models pose challenges in terms of interpretability and safety assurance.<n>This paper proposes a hierarchical and decoupled post-training framework tailored for pretrained neural networks.
arXiv Detail & Related papers (2025-06-08T15:19:03Z) - Lattice-Based Pruning in Recurrent Neural Networks via Poset Modeling [0.0]
Recurrent neural networks (RNNs) are central to sequence modeling tasks, yet their high computational complexity poses challenges for scalability and real-time deployment.<n>We introduce a novel framework that models RNNs as partially ordered sets (posets) and constructs corresponding dependency lattices.<n>By identifying meet irreducible neurons, our lattice-based pruning algorithm selectively retains critical connections while eliminating redundant ones.
arXiv Detail & Related papers (2025-02-23T10:11:38Z) - Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - Entangled Residual Mappings [59.02488598557491]
We introduce entangled residual mappings to generalize the structure of the residual connections.
An entangled residual mapping replaces the identity skip connections with specialized entangled mappings.
We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks.
arXiv Detail & Related papers (2022-06-02T19:36:03Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Stabilizing Equilibrium Models by Jacobian Regularization [151.78151873928027]
Deep equilibrium networks (DEQs) are a new class of models that eschews traditional depth in favor of finding the fixed point of a single nonlinear layer.
We propose a regularization scheme for DEQ models that explicitly regularizes the Jacobian of the fixed-point update equations to stabilize the learning of equilibrium models.
We show that this regularization adds only minimal computational cost, significantly stabilizes the fixed-point convergence in both forward and backward passes, and scales well to high-dimensional, realistic domains.
arXiv Detail & Related papers (2021-06-28T00:14:11Z) - Deep Networks from the Principle of Rate Reduction [32.87280757001462]
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification.
We show that the basic iterative ascent gradient scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer.
All components of this "white box" network have precise optimization, statistical, and geometric interpretation.
arXiv Detail & Related papers (2020-10-27T06:01:43Z) - Neural Subdivision [58.97214948753937]
This paper introduces Neural Subdivision, a novel framework for data-driven coarseto-fine geometry modeling.
We optimize for the same set of network weights across all local mesh patches, thus providing an architecture that is not constrained to a specific input mesh, fixed genus, or category.
We demonstrate that even when trained on a single high-resolution mesh our method generates reasonable subdivisions for novel shapes.
arXiv Detail & Related papers (2020-05-04T20:03:21Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.