Effective Regularization Through Loss-Function Metalearning
- URL: http://arxiv.org/abs/2010.00788v5
- Date: Wed, 11 Jun 2025 03:52:09 GMT
- Title: Effective Regularization Through Loss-Function Metalearning
- Authors: Santiago Gonzalez, Xin Qiu, Risto Miikkulainen,
- Abstract summary: Evolutionary computation can be used to optimize several different aspects of neural network architectures.<n>This paper demonstrates theoretically that such functions discourage overfitting, leading to effective regularization.<n>The analysis in this paper constitutes a first step towards understanding regularization, and demonstrates the power of evolutionary neural architecture search in general.
- Score: 13.093684676332911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evolutionary computation can be used to optimize several different aspects of neural network architectures. For instance, the TaylorGLO method discovers novel, customized loss functions, resulting in improved performance, faster training, and improved data utilization. A likely reason is that such functions discourage overfitting, leading to effective regularization. This paper demonstrates theoretically that this is indeed the case for TaylorGLO. Learning rule decomposition reveals that evolved loss functions balance two factors: the pull toward zero error, and a push away from it to avoid overfitting. This is a general principle that may be used to understand other regularization techniques as well (as demonstrated in this paper for label smoothing). The theoretical analysis leads to a constraint that can be utilized to find more effective loss functions in practice; the mechanism also results in networks that are more robust (as demonstrated in this paper with adversarial inputs). The analysis in this paper thus constitutes a first step towards understanding regularization, and demonstrates the power of evolutionary neural architecture search in general.
Related papers
- Dynamical loss functions shape landscape topography and improve learning in artificial neural networks [0.9208007322096533]
We show how to transform cross-entropy and mean squared error into dynamical loss functions.
We show how they significantly improve validation accuracy for networks of varying sizes.
arXiv Detail & Related papers (2024-10-14T16:27:03Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Accelerated Neural Network Training with Rooted Logistic Objectives [13.400503928962756]
We derive a novel sequence of em strictly convex functions that are at least as strict as logistic loss.
Our results illustrate that training with rooted loss function is converged faster and gains performance improvements.
arXiv Detail & Related papers (2023-10-05T20:49:48Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Outlier-Robust Neural Network Training: Efficient Optimization of Transformed Trimmed Loss with Variation Regularization [2.5628953713168685]
We consider outlier-robust predictive modeling using highly-expressive neural networks.
We employ (1) a transformed trimmed loss (TTL), which is a computationally feasible variant of the classical trimmed loss, and (2) a higher-order variation regularization (HOVR) of the prediction model.
arXiv Detail & Related papers (2023-08-04T12:57:13Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks.
Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z) - Online Loss Function Learning [13.744076477599707]
Loss function learning aims to automate the task of designing a loss function for a machine learning model.
We propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters.
arXiv Detail & Related papers (2023-01-30T19:22:46Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - Reduced Implication-bias Logic Loss for Neuro-Symbolic Learning [11.343715006460577]
Differentiable operators could bring a significant bias during backpropagation and degrade the performance of Neuro-Symbolic learning.
We propose a simple yet effective method to transform the biased loss functions into textitReduced Implication-bias Logic Loss.
Empirical study shows that RILL can achieve significant improvements compared with the biased logic loss functions.
arXiv Detail & Related papers (2022-08-14T11:57:46Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Refining neural network predictions using background knowledge [68.35246878394702]
We show we can use logical background knowledge in learning system to compensate for a lack of labeled training data.
We introduce differentiable refinement functions that find a corrected prediction close to the original prediction.
This algorithm finds optimal refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not.
arXiv Detail & Related papers (2022-06-10T10:17:59Z) - Towards Scaling Difference Target Propagation by Learning Backprop
Targets [64.90165892557776]
Difference Target Propagation is a biologically-plausible learning algorithm with close relation with Gauss-Newton (GN) optimization.
We propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored.
We report the best performance ever achieved by DTP on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2022-01-31T18:20:43Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Optimization and Generalization of Regularization-Based Continual
Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task.
Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z) - Optimizing Loss Functions Through Multivariate Taylor Polynomial
Parameterization [16.8615211682877]
Loss functions are a type of metaknowledge that is crucial to effective training of deep neural network (DNN) architectures.
This paper proposes continuous CMA-ES optimization of Taylor parameterizations.
In MNIST, CIFAR-10, and SVHN benchmark tasks, TaylorGLO finds new loss functions that outperform functions previously discovered through GP.
arXiv Detail & Related papers (2020-01-31T21:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.