Related papers: Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization

Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization

URL: http://arxiv.org/abs/2002.00059v4
Date: Fri, 2 Oct 2020 05:29:18 GMT
Title: Optimizing Loss Functions Through Multivariate Taylor Polynomial Parameterization
Authors: Santiago Gonzalez and Risto Miikkulainen
Abstract summary: Loss functions are a type of metaknowledge that is crucial to effective training of deep neural network (DNN) architectures. This paper proposes continuous CMA-ES optimization of Taylor parameterizations. In MNIST, CIFAR-10, and SVHN benchmark tasks, TaylorGLO finds new loss functions that outperform functions previously discovered through GP.
Score: 16.8615211682877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Metalearning of deep neural network (DNN) architectures and hyperparameters has become an increasingly important area of research. Loss functions are a type of metaknowledge that is crucial to effective training of DNNs, however, their potential role in metalearning has not yet been fully explored. Whereas early work focused on genetic programming (GP) on tree representations, this paper proposes continuous CMA-ES optimization of multivariate Taylor polynomial parameterizations. This approach, TaylorGLO, makes it possible to represent and search useful loss functions more effectively. In MNIST, CIFAR-10, and SVHN benchmark tasks, TaylorGLO finds new loss functions that outperform functions previously discovered through GP, as well as the standard cross-entropy loss, in fewer generations. These functions serve to regularize the learning task by discouraging overfitting to the labels, which is particularly useful in tasks where limited training data is available. The results thus demonstrate that loss function optimization is a productive new avenue for metalearning.

Related papers

Efficient Differentiable Approximation of Generalized Low-rank Regularization [64.73416824444328]
Low-rank regularization (LRR) has been widely applied in various machine learning tasks.<n>In this paper, we propose an efficient differentiable approximation of LRR.
arXiv Detail & Related papers (2025-05-21T11:49:17Z)
TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training [91.8932638236073]
We introduce textbfTensorGRaD, a novel method that directly addresses the memory challenges associated with large-structured weights.<n>We show that sparseGRaD reduces total memory usage by over $50%$ while maintaining and sometimes even improving accuracy.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations [5.731640425517324]
We show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. An activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. The established theory paves the way for designing and choosing effective activation functions for PINNs.
arXiv Detail & Related papers (2024-05-02T19:08:59Z)
Fast and Efficient Local Search for Genetic Programming Based Loss Function Learning [12.581217671500887]
We propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach. Results show that the learned loss functions bring improved convergence, sample efficiency, and inference performance on tabulated, computer vision, and natural language processing problems.
arXiv Detail & Related papers (2024-03-01T02:20:04Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
GIF: A General Graph Unlearning Strategy via Influence Function [63.52038638220563]
Graph Influence Function (GIF) is a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $epsilon$-mass perturbation in deleted data. We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify GIF's superiority in terms of unlearning efficacy, model utility, and unlearning efficiency.
arXiv Detail & Related papers (2023-04-06T03:02:54Z)
Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks. Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
A survey and taxonomy of loss functions in machine learning [51.35995529962554]
We present a comprehensive overview of the most widely used loss functions across key applications, including regression, classification, generative modeling, ranking, and energy-based modeling. We introduce 43 distinct loss functions, structured within an intuitive taxonomy that clarifies their theoretical foundations, properties, and optimal application contexts.
arXiv Detail & Related papers (2023-01-13T14:38:24Z)
Meta-learning PINN loss functions [5.543220407902113]
We propose a meta-learning technique for offline discovery of physics-informed neural network (PINN) loss functions. We develop a gradient-based meta-learning algorithm for addressing diverse task distributions based on parametrized partial differential equations (PDEs) Our results indicate that significant performance improvement can be achieved by using a shared-among-tasks offline-learned loss function.
arXiv Detail & Related papers (2021-07-12T16:13:39Z)
Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search [101.73248560009124]
We propose an effective convergence-simulation driven evolutionary search algorithm, CSE-Autoloss, for speeding up the search progress. We conduct extensive evaluations of loss function search on popular detectors and validate the good generalization capability of searched losses. Our experiments show that the best-discovered loss function combinations outperform default combinations by 1.1% and 0.8% in terms of mAP for two-stage and one-stage detectors.
arXiv Detail & Related papers (2021-02-09T08:34:52Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Effective Regularization Through Loss-Function Metalearning [16.8615211682877]
We show that loss functions evolved by TaylorGLO balance the pull to zero error, and a push away from it to avoid overfitting. Loss-function evolution can thus be seen as a well-founded new aspect of metalearning in neural networks.
arXiv Detail & Related papers (2020-10-02T05:22:21Z)
Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU) replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.