Tilting the playing field: Dynamical loss functions for machine learning
- URL: http://arxiv.org/abs/2102.03793v1
- Date: Sun, 7 Feb 2021 13:15:08 GMT
- Title: Tilting the playing field: Dynamical loss functions for machine learning
- Authors: Miguel Ruiz-Garcia, Ge Zhang, Samuel S. Schoenholz, Andrea J. Liu
- Abstract summary: We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time.
Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss.
- Score: 18.831125493827766
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We show that learning can be improved by using loss functions that evolve
cyclically during training to emphasize one class at a time. In
underparameterized networks, such dynamical loss functions can lead to
successful training for networks that fail to find a deep minima of the
standard cross-entropy loss. In overparameterized networks, dynamical loss
functions can lead to better generalization. Improvement arises from the
interplay of the changing loss landscape with the dynamics of the system as it
evolves to minimize the loss. In particular, as the loss function oscillates,
instabilities develop in the form of bifurcation cascades, which we study using
the Hessian and Neural Tangent Kernel. Valleys in the landscape widen and
deepen, and then narrow and rise as the loss landscape changes during a cycle.
As the landscape narrows, the learning rate becomes too large and the network
becomes unstable and bounces around the valley. This process ultimately pushes
the system into deeper and wider regions of the loss landscape and is
characterized by decreasing eigenvalues of the Hessian. This results in better
regularized models with improved generalization performance.
Related papers
- Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
We show how to transform cross-entropy and mean squared error into dynamical loss functions.
We show how they significantly improve validation accuracy for networks of varying sizes.
arXiv Detail & Related papers (2024-10-14T16:27:03Z) - Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms.
We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z) - Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian.
We find that certain spectral properties under $mu$P are largely independent of the size of the network.
We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z) - Towards Generalization in Subitizing with Neuro-Symbolic Loss using
Holographic Reduced Representations [49.22640185566807]
We show that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs.
We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs.
We find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement.
arXiv Detail & Related papers (2023-12-23T17:54:03Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - The instabilities of large learning rate training: a loss landscape view [2.4366811507669124]
We study the loss landscape by considering the Hessian matrix during network training with large learning rates.
We characterise the instabilities of gradient descent, and we observe the striking phenomena of textitlandscape flattening and textitlandscape shift
arXiv Detail & Related papers (2023-07-22T00:07:49Z) - Online Loss Function Learning [13.744076477599707]
Loss function learning aims to automate the task of designing a loss function for a machine learning model.
We propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters.
arXiv Detail & Related papers (2023-01-30T19:22:46Z) - Critical Investigation of Failure Modes in Physics-informed Neural
Networks [0.9137554315375919]
We show that a physics-informed neural network with a composite formulation produces highly non- learned loss surfaces that are difficult to optimize.
We also assess the training both approaches on two elliptic problems with increasingly complex target solutions.
arXiv Detail & Related papers (2022-06-20T18:43:35Z) - Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z) - Anomalous diffusion dynamics of learning in deep neural networks [0.0]
Learning in deep neural networks (DNNs) is implemented through minimizing a highly non-equilibrium loss function.
We present a novel account of how such effective deep learning emerges through the interactions of the fractal-like structure of the loss landscape.
arXiv Detail & Related papers (2020-09-22T14:57:59Z) - The Break-Even Point on Optimization Trajectories of Deep Neural
Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory.
We show that using a large learning rate in the initial phase of training reduces the variance of the gradient.
We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.