On the Dynamics Under the Unhinged Loss and Beyond
- URL: http://arxiv.org/abs/2312.07841v1
- Date: Wed, 13 Dec 2023 02:11:07 GMT
- Title: On the Dynamics Under the Unhinged Loss and Beyond
- Authors: Xiong Zhou, Xianming Liu, Hanzhang Wang, Deming Zhai, Junjun Jiang,
Xiangyang Ji
- Abstract summary: We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
- Score: 104.49565602940699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works have studied implicit biases in deep learning, especially the
behavior of last-layer features and classifier weights. However, they usually
need to simplify the intermediate dynamics under gradient flow or gradient
descent due to the intractability of loss functions and model architectures. In
this paper, we introduce the unhinged loss, a concise loss function, that
offers more mathematical opportunities to analyze the closed-form dynamics
while requiring as few simplifications or assumptions as possible. The unhinged
loss allows for considering more practical techniques, such as time-vary
learning rates and feature normalization. Based on the layer-peeled model that
views last-layer features as free optimization variables, we conduct a thorough
analysis in the unconstrained, regularized, and spherical constrained cases, as
well as the case where the neural tangent kernel remains invariant. To bridge
the performance of the unhinged loss to that of Cross-Entropy (CE), we
investigate the scenario of fixing classifier weights with a specific
structure, (e.g., a simplex equiangular tight frame). Our analysis shows that
these dynamics converge exponentially fast to a solution depending on the
initialization of features and classifier weights. These theoretical results
not only offer valuable insights, including explicit feature regularization and
rescaled learning rates for enhancing practical training with the unhinged
loss, but also extend their applicability to other loss functions. Finally, we
empirically demonstrate these theoretical results and insights through
extensive experiments.
Related papers
- A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural
Networks with Linear Activations [0.0]
We investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation.
We show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized.
arXiv Detail & Related papers (2023-05-17T02:26:34Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Regression as Classification: Influence of Task Formulation on Neural
Network Features [16.239708754973865]
Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss.
practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance.
By focusing on two-layer ReLU networks, we explore how the implicit bias induced by gradient-based optimization could partly explain the phenomenon.
arXiv Detail & Related papers (2022-11-10T15:13:23Z) - Perturbation Analysis of Neural Collapse [24.94449183555951]
Training deep neural networks for classification often includes minimizing the training loss beyond the zero training error point.
Recent works analyze this behavior via idealized unconstrained features models where all the minimizers exhibit exact collapse.
We propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix.
arXiv Detail & Related papers (2022-10-29T17:46:03Z) - Neural Collapse with Normalized Features: A Geometric Analysis over the
Riemannian Manifold [30.3185037354742]
When training over normalized deep networks for classification tasks, the learned features exhibit a so-called "neural collapse" phenomenon.
We show that better representations can be learned faster via feature normalization.
arXiv Detail & Related papers (2022-09-19T17:26:32Z) - On Convergence of Training Loss Without Reaching Stationary Points [62.41370821014218]
We show that Neural Network weight variables do not converge to stationary points where the gradient the loss function vanishes.
We propose a new perspective based on ergodic theory dynamical systems.
arXiv Detail & Related papers (2021-10-12T18:12:23Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks
Trained with the Logistic Loss [0.0]
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks.
We analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations.
arXiv Detail & Related papers (2020-02-11T15:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.