Related papers: Entropic Regularization in the Deep Linear Network

Entropic Regularization in the Deep Linear Network

URL: http://arxiv.org/abs/2512.06137v1
Date: Fri, 05 Dec 2025 20:36:13 GMT
Title: Entropic Regularization in the Deep Linear Network
Authors: Alan Chen, Tejas Kotwal, Govind Menon,
Abstract summary: We study regularization for the deep linear network (DLN) using the entropy formula introduced in arXiv:2509.09088.<n>The equilibria and gradient flow of the free energy are characterized for energies that depend on the singular values of the end-to-end matrix.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study regularization for the deep linear network (DLN) using the entropy formula introduced in arXiv:2509.09088. The equilibria and gradient flow of the free energy on the Riemannian manifold of end-to-end maps of the DLN are characterized for energies that depend symmetrically on the singular values of the end-to-end matrix. The only equilibria are minimizers and the set of minimizers is an orbit of the orthogonal group. In contrast with random matrix theory there is no singular value repulsion. The corresponding gradient flow reduces to a one-dimensional ordinary differential equation whose solution gives explicit relaxation rates toward the minimizers. We also study the concavity of the entropy in the chamber of singular values. The entropy is shown to be strictly concave in the Euclidean geometry on the chamber but not in the Riemannian geometry defined by the DLN metric.

Related papers

Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme [51.56484100374058]
Low-dimensional structure in real-world data plays an important role in the success of generative models.<n>We prove convergence theory of numerical schemes for manifold-valued differential equations.
arXiv Detail & Related papers (2026-03-04T01:29:35Z)
Explicit Discovery of Nonlinear Symmetries from Dynamic Data [50.20526548924647]
LieNLSD is the first method capable of determining the number of infinitesimal generators with nonlinear terms and their explicit expressions.<n>LieNLSD shows qualitative advantages over existing methods and improves the long rollout accuracy of neural PDE solvers by over 20%.
arXiv Detail & Related papers (2025-10-02T09:54:08Z)
Noninvertible Symmetry-Resolved Affleck-Ludwig-Cardy Formula and Entanglement Entropy from the Boundary Tube Algebra [0.0]
We derive a refined version of the Affleck-Ludwig-Cardy formula for a 1+1d conformal field theory.<n>We use this to determine the universal leading and sub-leading contributions to the noninvertible symmetry-resolved entanglement entropy of a single interval.
arXiv Detail & Related papers (2024-09-04T15:25:05Z)
The geometry of the Hermitian matrix space and the Schrieffer--Wolff transformation [0.0]
In quantum mechanics, the Schrieffer--Wolff (SW) transformation is known as an approximative method to reduce the perturbation dimension of Hamiltonian. We prove that it induces a local coordinate in the space of Hermitian matrices near a $k$-fold degeneracy submanifold.
arXiv Detail & Related papers (2024-07-15T07:05:39Z)
Symmetry-resolved Entanglement Entropy, Spectra & Boundary Conformal Field Theory [0.0]
We perform a comprehensive analysis of the symmetry-resolved entanglement entropy (EE) for one single interval in the ground state of a $1+1$D conformal field theory (CFT) We utilize the boundary CFT approach to study the total EE, which enables us to find the universal leading order behavior of the SREE. We derive the symmetry-resolved entanglement spectra for a CFT invariant under a finite symmetry group.
arXiv Detail & Related papers (2023-09-06T18:03:14Z)
Renormalization group and spectra of the generalized P\"oschl-Teller potential [0.0]
We study the P"oschl-Teller potential $V(x) = alpha2 g_s sinh-2(alpha x) + alpha2 g_c cosh-2(alpha x)$, for every value of the dimensionless parameters $g_s$ and $g_c singularity. We show that supersymmetry of the potential, when present, is also spontaneously broken, along with conformal symmetry.
arXiv Detail & Related papers (2023-08-08T21:44:55Z)
Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks. We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z)
Learning Discretized Neural Networks under Ricci Flow [48.47315844022283]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.<n>DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z)
The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization [13.872374586700767]
Recent work has shown that shaping the activation function as network depth grows large is necessary. We identify the precise scaling of the activation function necessary to arrive at a nontrivial limit. We recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
arXiv Detail & Related papers (2022-06-06T17:45:07Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training. If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.