Spline parameterization of neural network controls for deep learning
- URL: http://arxiv.org/abs/2103.00301v1
- Date: Sat, 27 Feb 2021 19:35:45 GMT
- Title: Spline parameterization of neural network controls for deep learning
- Authors: Stefanie G\"unther, Will Pazner, Dongping Qi
- Abstract summary: We choose a fixed number of B-spline basis functions whose coefficients are the trainable parameters of the neural network.
We numerically show that the spline-based neural network increases robustness of the learning problem towards hyper parameters.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Based on the continuous interpretation of deep learning cast as an optimal
control problem, this paper investigates the benefits of employing B-spline
basis functions to parameterize neural network controls across the layers.
Rather than equipping each layer of a discretized ODE-network with a set of
trainable weights, we choose a fixed number of B-spline basis functions whose
coefficients are the trainable parameters of the neural network. Decoupling the
trainable parameters from the layers of the neural network enables us to
investigate and adapt the accuracy of the network propagation separated from
the optimization learning problem. We numerically show that the spline-based
neural network increases robustness of the learning problem towards
hyperparameters due to increased stability and accuracy of the network
propagation. Further, training on B-spline coefficients rather than layer
weights directly enables a reduction in the number of trainable parameters.
Related papers
- Using Cooperative Game Theory to Prune Neural Networks [7.3959659158152355]
We show how solution concepts from cooperative game theory can be used to tackle the problem of pruning neural networks.
We introduce a method called Game Theory Assisted Pruning (GTAP), which reduces the neural network's size while preserving its predictive accuracy.
arXiv Detail & Related papers (2023-11-17T11:48:10Z) - Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks [0.16385815610837165]
Sparse deep learning addresses challenges by recovering a sparse representation of the underlying target function.
Deep neural architectures compressed via structured sparsity provide low latency inference, higher data throughput, and reduced energy consumption.
We propose structurally sparse Bayesian neural networks which prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors.
arXiv Detail & Related papers (2023-08-17T17:14:18Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Deep Kronecker neural networks: A general framework for neural networks
with adaptive activation functions [4.932130498861987]
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions.
Under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks.
arXiv Detail & Related papers (2021-05-20T04:54:57Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z) - Continual Learning with Extended Kronecker-factored Approximate
Curvature [33.44290346786496]
We propose a quadratic penalty method for continual learning of neural networks that contain batch normalization layers.
A Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a neural network.
We extend the K-FAC method so that the inter-example relations are taken into account and the Hessian of deep neural networks can be properly approximated.
arXiv Detail & Related papers (2020-04-16T07:58:47Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.