A Kernel-Expanded Stochastic Neural Network
- URL: http://arxiv.org/abs/2201.05319v1
- Date: Fri, 14 Jan 2022 06:42:42 GMT
- Title: A Kernel-Expanded Stochastic Neural Network
- Authors: Yan Sun, Faming Liang
- Abstract summary: Deep neural network often gets trapped into a local minimum in training.
New kernel-expanded neural network (K-StoNet) model reformulates the network as a latent variable model.
Model can be easily trained using the imputationregularized optimization (IRO) algorithm.
- Score: 10.837308632004644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deep neural network suffers from many fundamental issues in machine
learning. For example, it often gets trapped into a local minimum in training,
and its prediction uncertainty is hard to be assessed. To address these issues,
we propose the so-called kernel-expanded stochastic neural network (K-StoNet)
model, which incorporates support vector regression (SVR) as the first hidden
layer and reformulates the neural network as a latent variable model. The
former maps the input vector into an infinite dimensional feature space via a
radial basis function (RBF) kernel, ensuring absence of local minima on its
training loss surface. The latter breaks the high-dimensional nonconvex neural
network training problem into a series of low-dimensional convex optimization
problems, and enables its prediction uncertainty easily assessed. The K-StoNet
can be easily trained using the imputation-regularized optimization (IRO)
algorithm. Compared to traditional deep neural networks, K-StoNet possesses a
theoretical guarantee to asymptotically converge to the global optimum and
enables the prediction uncertainty easily assessed. The performances of the new
model in training, prediction and uncertainty quantification are illustrated by
simulated and real data examples.
Related papers
- SGD method for entropy error function with smoothing l0 regularization for neural networks [3.108634881604788]
entropy error function has been widely used in neural networks.
We propose a novel entropy function with smoothing l0 regularization for feed-forward neural networks.
Our work is novel as it enables neural networks to learn effectively, producing more accurate predictions.
arXiv Detail & Related papers (2024-05-28T19:54:26Z) - Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks [0.5827521884806072]
Large neural networks trained on large datasets have become the dominant paradigm in machine learning.
This thesis develops scalable methods to equip neural networks with model uncertainty.
arXiv Detail & Related papers (2024-04-29T23:38:58Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - A Simple Approach to Improve Single-Model Deep Uncertainty via
Distance-Awareness [33.09831377640498]
We study approaches to improve uncertainty property of a single network, based on a single, deterministic representation.
We propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs.
On a suite of vision and language understanding benchmarks, SNGP outperforms other single-model approaches in prediction, calibration and out-of-domain detection.
arXiv Detail & Related papers (2022-05-01T05:46:13Z) - Imbedding Deep Neural Networks [0.0]
Continuous depth neural networks, such as Neural ODEs, have refashioned the understanding of residual neural networks in terms of non-linear vector-valued optimal control problems.
We propose a new approach which explicates the network's depth' as a fundamental variable, thus reducing the problem to a system of forward-facing initial value problems.
arXiv Detail & Related papers (2022-01-31T22:00:41Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Reduced-Order Neural Network Synthesis with Robustness Guarantees [0.0]
Machine learning algorithms are being adapted to run locally on board, potentially hardware limited, devices to improve user privacy, reduce latency and be more energy efficient.
To address this issue, a method to automatically synthesize reduced-order neural networks (having fewer neurons) approxing the input/output mapping of a larger one is introduced.
Worst-case bounds for this approximation error are obtained and the approach can be applied to a wide variety of neural networks architectures.
arXiv Detail & Related papers (2021-02-18T12:03:57Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.