Bayesian Nested Neural Networks for Uncertainty Calibration and Adaptive
Compression
- URL: http://arxiv.org/abs/2101.11353v1
- Date: Wed, 27 Jan 2021 12:34:58 GMT
- Title: Bayesian Nested Neural Networks for Uncertainty Calibration and Adaptive
Compression
- Authors: Yufei Cui, Ziquan Liu, Qiao Li, Yu Mao, Antoni B. Chan, Chun Jason Xue
- Abstract summary: Nested networks or slimmable networks are neural networks whose architectures can be adjusted instantly during testing time.
Recent studies have focused on a "nested dropout" layer, which is able to order the nodes of a layer by importance during training.
- Score: 40.35734017517066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nested networks or slimmable networks are neural networks whose architectures
can be adjusted instantly during testing time, e.g., based on computational
constraints. Recent studies have focused on a "nested dropout" layer, which is
able to order the nodes of a layer by importance during training, thus
generating a nested set of sub-networks that are optimal for different
configurations of resources. However, the dropout rate is fixed as a
hyper-parameter over different layers during the whole training process.
Therefore, when nodes are removed, the performance decays in a human-specified
trajectory rather than in a trajectory learned from data. Another drawback is
the generated sub-networks are deterministic networks without well-calibrated
uncertainty. To address these two problems, we develop a Bayesian approach to
nested neural networks. We propose a variational ordering unit that draws
samples for nested dropout at a low cost, from a proposed Downhill
distribution, which provides useful gradients to the parameters of nested
dropout. Based on this approach, we design a Bayesian nested neural network
that learns the order knowledge of the node distributions. In experiments, we
show that the proposed approach outperforms the nested network in terms of
accuracy, calibration, and out-of-domain detection in classification tasks. It
also outperforms the related approach on uncertainty-critical tasks in computer
vision.
Related papers
- On the Convergence of Locally Adaptive and Scalable Diffusion-Based Sampling Methods for Deep Bayesian Neural Network Posteriors [2.3265565167163906]
Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks.
generating samples from the posterior distribution of neural networks is a major challenge.
One advance in that direction would be the incorporation of adaptive step sizes into Monte Carlo Markov chain sampling algorithms.
In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.
arXiv Detail & Related papers (2024-03-13T15:21:14Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Improving the Trainability of Deep Neural Networks through Layerwise
Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network.
We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Variational Neural Networks [88.24021148516319]
We propose a method for uncertainty estimation in neural networks called Variational Neural Network (VNN)
VNN generates parameters for the output distribution of a layer by transforming its inputs with learnable sub-layers.
In uncertainty quality estimation experiments, we show that VNNs achieve better uncertainty quality than Monte Carlo Dropout or Bayes By Backpropagation methods.
arXiv Detail & Related papers (2022-07-04T15:41:02Z) - Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical
Guarantees and Implementation Details [0.5156484100374059]
Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies.
We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training.
We establish the fundamental result of variational posterior consistency together with the characterization of prior parameters.
arXiv Detail & Related papers (2021-08-25T00:48:07Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.