Lifted Bregman Training of Neural Networks
- URL: http://arxiv.org/abs/2208.08772v1
- Date: Thu, 18 Aug 2022 11:12:52 GMT
- Title: Lifted Bregman Training of Neural Networks
- Authors: Xiaoyu Wang, Martin Benning
- Abstract summary: We introduce a novel mathematical formulation for the training of feed-forward neural networks with (potentially non-smooth) proximal maps as activation functions.
This formulation is based on Bregman and a key advantage is that its partial derivatives with respect to the network's parameters do not require the computation of derivatives of the network's activation functions.
We present several numerical results that demonstrate that these training approaches can be equally well or even better suited for the training of neural network-based classifiers and (denoising) autoencoders with sparse coding.
- Score: 28.03724379169264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce a novel mathematical formulation for the training of
feed-forward neural networks with (potentially non-smooth) proximal maps as
activation functions. This formulation is based on Bregman distances and a key
advantage is that its partial derivatives with respect to the network's
parameters do not require the computation of derivatives of the network's
activation functions. Instead of estimating the parameters with a combination
of first-order optimisation method and back-propagation (as is the
state-of-the-art), we propose the use of non-smooth first-order optimisation
methods that exploit the specific structure of the novel formulation. We
present several numerical results that demonstrate that these training
approaches can be equally well or even better suited for the training of neural
network-based classifiers and (denoising) autoencoders with sparse coding
compared to more conventional training frameworks.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural
Networks [34.263013539187355]
We propose a new algorithm for training deep neural networks (DNNs) with binary weights.
Experimental results demonstrate that our new algorithm offers favorable performance compared to existing approaches.
arXiv Detail & Related papers (2021-12-06T09:12:15Z) - Tensor-based framework for training flexible neural networks [9.176056742068813]
We propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem.
The proposed algorithm can handle different bases decomposition.
The goal of this method is to compress large pretrained NN models, by replacing tensorworks, em i.e., one or multiple layers of the original network, by a new flexible layer.
arXiv Detail & Related papers (2021-06-25T10:26:48Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Hyperparameter Optimization in Binary Communication Networks for
Neuromorphic Deployment [4.280642750854163]
Training neural networks for neuromorphic deployment is non-trivial.
We introduce a Bayesian approach for optimizing the hyper parameters of an algorithm for training binary communication networks that can be deployed to neuromorphic hardware.
We show that by optimizing the hyper parameters on this algorithm for each dataset, we can achieve improvements in accuracy over the previous state-of-the-art for this algorithm on each dataset.
arXiv Detail & Related papers (2020-04-21T01:15:45Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.