On Power Laws in Deep Ensembles
- URL: http://arxiv.org/abs/2007.08483v2
- Date: Mon, 28 Jun 2021 13:19:55 GMT
- Title: On Power Laws in Deep Ensembles
- Authors: Ekaterina Lobacheva, Nadezhda Chirkova, Maxim Kodryan, Dmitry Vetrov
- Abstract summary: We show that one large network may perform worse than an ensemble of several medium-size networks with the same total number of parameters.
Using the detected power law-like dependencies, we can predict the possible gain from the ensembling of networks with given structure.
- Score: 12.739425443572202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensembles of deep neural networks are known to achieve state-of-the-art
performance in uncertainty estimation and lead to accuracy improvement. In this
work, we focus on a classification problem and investigate the behavior of both
non-calibrated and calibrated negative log-likelihood (CNLL) of a deep ensemble
as a function of the ensemble size and the member network size. We indicate the
conditions under which CNLL follows a power law w.r.t. ensemble size or member
network size, and analyze the dynamics of the parameters of the discovered
power laws. Our important practical finding is that one large network may
perform worse than an ensemble of several medium-size networks with the same
total number of parameters (we call this ensemble a memory split). Using the
detected power law-like dependencies, we can predict (1) the possible gain from
the ensembling of networks with given structure, (2) the optimal memory split
given a memory budget, based on a relatively small number of trained networks.
We describe the memory split advantage effect in more details in
arXiv:2005.07292
Related papers
- Make Interval Bound Propagation great again [7.121259735505479]
In various scenarios motivated by real life, such as medical data analysis, autonomous driving, and adversarial training, we are interested in robust deep networks.
This paper shows how to calculate the robustness of a given pre-trained network and how to construct robust networks.
We adapt two classical approaches dedicated to strict computations to mitigate the wrapping effect in neural networks.
arXiv Detail & Related papers (2024-10-04T12:39:46Z) - Regressions on quantum neural networks at maximal expressivity [0.0]
We analyze the expressivity of a universal deep neural network that can be organized as a series of nested qubit rotations.
The maximal expressive power increases with the depth of the network and the number of qubits, but is fundamentally bounded by the data encoding mechanism.
arXiv Detail & Related papers (2023-11-10T14:43:24Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Ensembled sparse-input hierarchical networks for high-dimensional
datasets [8.629912408966145]
We show that dense neural networks can be a practical data analysis tool in settings with small sample sizes.
A proposed method appropriately prunes the network structure by tuning only two L1-penalty parameters.
On a collection of real-world datasets with different sizes, EASIER-net selected network architectures in a data-adaptive manner and achieved higher prediction accuracy than off-the-shelf methods on average.
arXiv Detail & Related papers (2020-05-11T02:08:53Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Splitting Convolutional Neural Network Structures for Efficient
Inference [11.031841470875571]
A new technique is proposed to split the network structure into small parts that consume lower memory than the original network.
The split approach has been tested on two well-known network structures of VGG16 and ResNet18 for the classification of CIFAR10 images.
arXiv Detail & Related papers (2020-02-09T06:53:18Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.