Related papers: Tilting the playing field: Dynamical loss functions for machine learning

Tilting the playing field: Dynamical loss functions for machine learning

URL: http://arxiv.org/abs/2102.03793v1
Date: Sun, 7 Feb 2021 13:15:08 GMT
Title: Tilting the playing field: Dynamical loss functions for machine learning
Authors: Miguel Ruiz-Garcia, Ge Zhang, Samuel S. Schoenholz, Andrea J. Liu
Abstract summary: We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss.
Score: 18.831125493827766
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. In underparameterized networks, such dynamical loss functions can lead to successful training for networks that fail to find a deep minima of the standard cross-entropy loss. In overparameterized networks, dynamical loss functions can lead to better generalization. Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss. In particular, as the loss function oscillates, instabilities develop in the form of bifurcation cascades, which we study using the Hessian and Neural Tangent Kernel. Valleys in the landscape widen and deepen, and then narrow and rise as the loss landscape changes during a cycle. As the landscape narrows, the learning rate becomes too large and the network becomes unstable and bounces around the valley. This process ultimately pushes the system into deeper and wider regions of the loss landscape and is characterized by decreasing eigenvalues of the Hessian. This results in better regularized models with improved generalization performance.

Related papers

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks [59.552873049024775]
We show that compute-optimally trained models exhibit a remarkably precise universality.<n>With learning rate decay, the collapse becomes so tight that differences in the normalized curves across models fall below the noise floor.<n>We explain these phenomena by connecting collapse to the power-law structure in typical neural scaling laws.
arXiv Detail & Related papers (2025-07-02T20:03:34Z)
Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z)
Navigating loss manifolds via rigid body dynamics: A promising avenue for robustness and generalisation [11.729464930866483]
Training large neural networks through gradient-based optimization requires navigating high-dimensional loss landscapes.<n>We propose an alternative that simultaneously reduces this dependence, and avoids sharp minima.
arXiv Detail & Related papers (2025-05-26T05:26:21Z)
Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
We show how to transform cross-entropy and mean squared error into dynamical loss functions. We show how they significantly improve validation accuracy for networks of varying sizes.
arXiv Detail & Related papers (2024-10-14T16:27:03Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian. We find that certain spectral properties under $mu$P are largely independent of the size of the network. We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z)
Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations [49.22640185566807]
We show that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs. We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs. We find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement.
arXiv Detail & Related papers (2023-12-23T17:54:03Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
The instabilities of large learning rate training: a loss landscape view [2.4366811507669124]
We study the loss landscape by considering the Hessian matrix during network training with large learning rates. We characterise the instabilities of gradient descent, and we observe the striking phenomena of textitlandscape flattening and textitlandscape shift
arXiv Detail & Related papers (2023-07-22T00:07:49Z)
Online Loss Function Learning [13.744076477599707]
Loss function learning aims to automate the task of designing a loss function for a machine learning model. We propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters.
arXiv Detail & Related papers (2023-01-30T19:22:46Z)
Critical Investigation of Failure Modes in Physics-informed Neural Networks [0.9137554315375919]
We show that a physics-informed neural network with a composite formulation produces highly non- learned loss surfaces that are difficult to optimize. We also assess the training both approaches on two elliptic problems with increasingly complex target solutions.
arXiv Detail & Related papers (2022-06-20T18:43:35Z)
Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training. We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy. Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z)
Anomalous diffusion dynamics of learning in deep neural networks [0.0]
Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-equilibrium loss function. We present a novel account of how such effective deep learning emerges through the interactions of the fractal-like structure of the loss landscape.
arXiv Detail & Related papers (2020-09-22T14:57:59Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.