Related papers: Dynamical loss functions shape landscape topography and improve learning in artificial neural networks

Dynamical loss functions shape landscape topography and improve learning in artificial neural networks

URL: http://arxiv.org/abs/2410.10690v2
Date: Wed, 30 Oct 2024 12:47:04 GMT
Title: Dynamical loss functions shape landscape topography and improve learning in artificial neural networks
Authors: Eduardo Lavin, Miguel Ruiz-Garcia,
Abstract summary: We show how to transform cross-entropy and mean squared error into dynamical loss functions. We show how they significantly improve validation accuracy for networks of varying sizes.
Score: 0.9208007322096533
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Dynamical loss functions are derived from standard loss functions used in supervised classification tasks, but they are modified such that the contribution from each class periodically increases and decreases. These oscillations globally alter the loss landscape without affecting the global minima. In this paper, we demonstrate how to transform cross-entropy and mean squared error into dynamical loss functions. We begin by discussing the impact of increasing the size of the neural network or the learning rate on the learning process. Building on this intuition, we propose several versions of dynamical loss functions and show how they significantly improve validation accuracy for networks of varying sizes. Finally, we explore how the landscape of these dynamical loss functions evolves during training, highlighting the emergence of instabilities that may be linked to edge-of-instability minimization.

Related papers

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions [51.68215326304272]
We show that even small perturbations reliably cause otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time.<n>Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.
arXiv Detail & Related papers (2025-06-16T08:35:16Z)
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes [0.0]
We theoretically analyze the convergence of the loss landscape in a fully connected neural network and derive upper bounds for the difference in loss function values when adding a new object to the sample. Our empirical study confirms these results on various datasets, demonstrating the convergence of the loss function surface for image classification tasks.
arXiv Detail & Related papers (2024-09-18T14:04:15Z)
Disentangling the Causes of Plasticity Loss in Neural Networks [55.23250269007988]
We show that loss of plasticity can be decomposed into multiple independent mechanisms. We show that a combination of layer normalization and weight decay is highly effective at maintaining plasticity in a variety of synthetic nonstationary learning tasks.
arXiv Detail & Related papers (2024-02-29T00:02:33Z)
Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations [49.22640185566807]
We show that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs. We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs. We find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement.
arXiv Detail & Related papers (2023-12-23T17:54:03Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks [0.0]
T-batching is a valuable technique for training dynamic network models. We have identified a limitation in the training loss function used with t-batching. We propose two alternative loss functions that overcome these issues, resulting in enhanced training performance.
arXiv Detail & Related papers (2023-08-13T23:34:36Z)
Online Loss Function Learning [13.744076477599707]
Loss function learning aims to automate the task of designing a loss function for a machine learning model. We propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters.
arXiv Detail & Related papers (2023-01-30T19:22:46Z)
Critical Investigation of Failure Modes in Physics-informed Neural Networks [0.9137554315375919]
We show that a physics-informed neural network with a composite formulation produces highly non- learned loss surfaces that are difficult to optimize. We also assess the training both approaches on two elliptic problems with increasingly complex target solutions.
arXiv Detail & Related papers (2022-06-20T18:43:35Z)
Tilting the playing field: Dynamical loss functions for machine learning [18.831125493827766]
We show that learning can be improved by using loss functions that evolve cyclically during training to emphasize one class at a time. Improvement arises from the interplay of the changing loss landscape with the dynamics of the system as it evolves to minimize the loss.
arXiv Detail & Related papers (2021-02-07T13:15:08Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.