Implicit Bias in Deep Linear Discriminant Analysis
- URL: http://arxiv.org/abs/2603.02622v1
- Date: Tue, 03 Mar 2026 05:49:24 GMT
- Title: Implicit Bias in Deep Linear Discriminant Analysis
- Authors: Jiawen Li,
- Abstract summary: This paper presents an initial theoretical analysis of the implicit regularization induced by the Deep LDA.<n>By analyzing the gradient flow of the loss on a L-layer diagonal linear network, we prove that under balanced initialization, the network architecture transforms standard additive gradient updates into multiplicative weight updates.
- Score: 2.3304389517825808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the Implicit Bias(or Implicit Regularization) of standard loss functions has been studied, the optimization geometry induced by discriminative metric-learning objectives remains largely unexplored.To the best of our knowledge, this paper presents an initial theoretical analysis of the implicit regularization induced by the Deep LDA,a scale invariant objective designed to minimize intraclass variance and maximize interclass distance. By analyzing the gradient flow of the loss on a L-layer diagonal linear network, we prove that under balanced initialization, the network architecture transforms standard additive gradient updates into multiplicative weight updates, which demonstrates an automatic conservation of the (2/L) quasi-norm.
Related papers
- High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator [7.837881800517111]
Quantized neural network training optimize a discrete, non-differentiable objective.<n>The straight-through estimator (STE) enables backpropagation through surrogate gradients.<n>We theoretically show that in the high-dimensional limit, STE dynamics converge to an ordinary deterministic differential equation.
arXiv Detail & Related papers (2025-10-12T16:43:46Z) - A Simplified Analysis of SGD for Linear Regression with Weight Averaging [64.2393952273612]
Recent work bycitetzou 2021benign provides sharp rates for SGD optimization in linear regression using constant learning rate.<n>We provide a simplified analysis recovering the same bias and variance bounds provided incitepzou 2021benign based on simple linear algebra tools.<n>We believe our work makes the analysis of gradient descent on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling.
arXiv Detail & Related papers (2025-06-18T15:10:38Z) - Error Bound Analysis for the Regularized Loss of Deep Linear Neural Networks [8.46000784973359]
In this work, we study the local geometric landscape of the regularized squared loss of deep linear networks.<n>We establish an error bound for the regularized loss under mild conditions on network width and regularization parameters.<n>To support our theoretical findings, we conduct numerical experiments and demonstrate that gradient descent converges linearly to a critical point.
arXiv Detail & Related papers (2025-02-16T14:53:52Z) - Implicit regularization in AI meets generalized hardness of
approximation in optimization -- Sharp results for diagonal linear networks [0.0]
We show sharp results for the implicit regularization imposed by the gradient flow of Diagonal Linear Networks.
We link this to the phenomenon of phase transitions in generalized hardness of approximation.
Non-sharpness of our results would imply that the GHA phenomenon would not occur for the basis pursuit optimization problem.
arXiv Detail & Related papers (2023-07-13T13:27:51Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - Robust Implicit Regularization via Weight Normalization [5.37610807422229]
We show that weight normalization enables a robust bias that persists even if the weights are at practically large scale.
Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization.
arXiv Detail & Related papers (2023-05-09T13:38:55Z) - Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Inductive Bias of Gradient Descent for Exponentially Weight Normalized
Smooth Homogeneous Neural Nets [1.7259824817932292]
We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss.
This paper shows that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate.
arXiv Detail & Related papers (2020-10-24T14:34:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.