The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks
- URL: http://arxiv.org/abs/2507.06367v1
- Date: Tue, 08 Jul 2025 20:04:00 GMT
- Title: The Riemannian Geometry associated to Gradient Flows of Linear Convolutional Networks
- Authors: El Mehdi Achour, Kathlén Kohn, Holger Rauhut,
- Abstract summary: We study geometric properties of the gradient flow for learning deep linear convolutional networks.<n>For convolutions with $D geq 2$ and for $D =1$ it holds if all so-called strides of the convolutions are greater than one.
- Score: 4.898188452239539
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study geometric properties of the gradient flow for learning deep linear convolutional networks. For linear fully connected networks, it has been shown recently that the corresponding gradient flow on parameter space can be written as a Riemannian gradient flow on function space (i.e., on the product of weight matrices) if the initialization satisfies a so-called balancedness condition. We establish that the gradient flow on parameter space for learning linear convolutional networks can be written as a Riemannian gradient flow on function space regardless of the initialization. This result holds for $D$-dimensional convolutions with $D \geq 2$, and for $D =1$ it holds if all so-called strides of the convolutions are greater than one. The corresponding Riemannian metric depends on the initialization.
Related papers
- Gradient flow in parameter space is equivalent to linear interpolation in output space [1.189367612437469]
We prove that the standard flow in parameter space that underlies many training algorithms in deep learning can be continuously deformed into an adapted gradient flow.<n>For the $L2$ loss, if the Jacobian of the outputs with respect to the parameters is full rank, then the time variable can be reparametrized so that the resulting flow is simply linear.<n>For the cross-entropy loss, under the same rank condition and assuming the labels have positive components, we derive an explicit formula for the unique global minimum.
arXiv Detail & Related papers (2024-08-02T18:23:17Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning [1.4050802766699084]
We consider the scenario of supervised learning in Deep Learning (DL) networks.<n>We choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network.
arXiv Detail & Related papers (2023-11-27T02:12:02Z) - On Learning Gaussian Multi-index Models with Gradient Flow [57.170617397894404]
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data.
We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection.
arXiv Detail & Related papers (2023-10-30T17:55:28Z) - Deep Linear Networks for Matrix Completion -- An Infinite Depth Limit [10.64241024049424]
The deep linear network (DLN) is a model for implicit regularization in gradient based optimization of overparametrized learning architectures.
We investigate the link between the geometric geometry and the trainings for matrix completion with rigorous analysis and numerics.
We propose that implicit regularization is a result of bias towards high state space volume.
arXiv Detail & Related papers (2022-10-22T17:03:10Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Magnitude and Angle Dynamics in Training Single ReLU Neurons [45.886537625951256]
We decompose gradient flow $w(t)$ to magnitude $w(t)$ and angle $phi(t):= pi - theta(t) $ components.
We find that small scale initialization induces slow convergence speed for deep single ReLU neurons.
arXiv Detail & Related papers (2022-09-27T13:58:46Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - Learning Linearized Assignment Flows for Image Labeling [70.540936204654]
We introduce a novel algorithm for estimating optimal parameters of linearized assignment flows for image labeling.
We show how to efficiently evaluate this formula using a Krylov subspace and a low-rank approximation.
arXiv Detail & Related papers (2021-08-02T13:38:09Z) - A Unifying View on Implicit Bias in Training Linear Neural Networks [31.65006970108761]
We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training.
We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and convolutional networks as special cases.
arXiv Detail & Related papers (2020-10-06T06:08:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.