Optimizing Loss Functions Through Multivariate Taylor Polynomial
Parameterization
- URL: http://arxiv.org/abs/2002.00059v4
- Date: Fri, 2 Oct 2020 05:29:18 GMT
- Title: Optimizing Loss Functions Through Multivariate Taylor Polynomial
Parameterization
- Authors: Santiago Gonzalez and Risto Miikkulainen
- Abstract summary: Loss functions are a type of metaknowledge that is crucial to effective training of deep neural network (DNN) architectures.
This paper proposes continuous CMA-ES optimization of Taylor parameterizations.
In MNIST, CIFAR-10, and SVHN benchmark tasks, TaylorGLO finds new loss functions that outperform functions previously discovered through GP.
- Score: 16.8615211682877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metalearning of deep neural network (DNN) architectures and hyperparameters
has become an increasingly important area of research. Loss functions are a
type of metaknowledge that is crucial to effective training of DNNs, however,
their potential role in metalearning has not yet been fully explored. Whereas
early work focused on genetic programming (GP) on tree representations, this
paper proposes continuous CMA-ES optimization of multivariate Taylor polynomial
parameterizations. This approach, TaylorGLO, makes it possible to represent and
search useful loss functions more effectively. In MNIST, CIFAR-10, and SVHN
benchmark tasks, TaylorGLO finds new loss functions that outperform functions
previously discovered through GP, as well as the standard cross-entropy loss,
in fewer generations. These functions serve to regularize the learning task by
discouraging overfitting to the labels, which is particularly useful in tasks
where limited training data is available. The results thus demonstrate that
loss function optimization is a productive new avenue for metalearning.
Related papers
- Fast and Efficient Local Search for Genetic Programming Based Loss
Function Learning [12.581217671500887]
We propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach.
Results show that the learned loss functions bring improved convergence, sample efficiency, and inference performance on tabulated, computer vision, and natural language processing problems.
arXiv Detail & Related papers (2024-03-01T02:20:04Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - GIF: A General Graph Unlearning Strategy via Influence Function [63.52038638220563]
Graph Influence Function (GIF) is a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $epsilon$-mass perturbation in deleted data.
We conduct extensive experiments on four representative GNN models and three benchmark datasets to justify GIF's superiority in terms of unlearning efficacy, model utility, and unlearning efficiency.
arXiv Detail & Related papers (2023-04-06T03:02:54Z) - Alternate Loss Functions for Classification and Robust Regression Can Improve the Accuracy of Artificial Neural Networks [6.452225158891343]
This paper shows that training speed and final accuracy of neural networks can significantly depend on the loss function used to train neural networks.
Two new classification loss functions that significantly improve performance on a wide variety of benchmark tasks are proposed.
arXiv Detail & Related papers (2023-03-17T12:52:06Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Meta-learning PINN loss functions [5.543220407902113]
We propose a meta-learning technique for offline discovery of physics-informed neural network (PINN) loss functions.
We develop a gradient-based meta-learning algorithm for addressing diverse task distributions based on parametrized partial differential equations (PDEs)
Our results indicate that significant performance improvement can be achieved by using a shared-among-tasks offline-learned loss function.
arXiv Detail & Related papers (2021-07-12T16:13:39Z) - Loss Function Discovery for Object Detection via Convergence-Simulation
Driven Search [101.73248560009124]
We propose an effective convergence-simulation driven evolutionary search algorithm, CSE-Autoloss, for speeding up the search progress.
We conduct extensive evaluations of loss function search on popular detectors and validate the good generalization capability of searched losses.
Our experiments show that the best-discovered loss function combinations outperform default combinations by 1.1% and 0.8% in terms of mAP for two-stage and one-stage detectors.
arXiv Detail & Related papers (2021-02-09T08:34:52Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Effective Regularization Through Loss-Function Metalearning [16.8615211682877]
We show that loss functions evolved by TaylorGLO balance the pull to zero error, and a push away from it to avoid overfitting.
Loss-function evolution can thus be seen as a well-founded new aspect of metalearning in neural networks.
arXiv Detail & Related papers (2020-10-02T05:22:21Z) - Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU)
replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy.
These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.