Collegial Ensembles
- URL: http://arxiv.org/abs/2006.07678v2
- Date: Wed, 17 Jun 2020 15:33:22 GMT
- Title: Collegial Ensembles
- Authors: Etai Littwin and Ben Myara and Sima Sabah and Joshua Susskind and
Shuangfei Zhai and Oren Golan
- Abstract summary: We show that collegial ensembles can be efficiently implemented in practical architectures using group convolutions and block diagonal layers.
We also show how our framework can be used to analytically derive optimal group convolution modules without having to train a single model.
- Score: 11.64359837358763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern neural network performance typically improves as model size increases.
A recent line of research on the Neural Tangent Kernel (NTK) of
over-parameterized networks indicates that the improvement with size increase
is a product of a better conditioned loss landscape. In this work, we
investigate a form of over-parameterization achieved through ensembling, where
we define collegial ensembles (CE) as the aggregation of multiple independent
models with identical architectures, trained as a single model. We show that
the optimization dynamics of CE simplify dramatically when the number of models
in the ensemble is large, resembling the dynamics of wide models, yet scale
much more favorably. We use recent theoretical results on the finite width
corrections of the NTK to perform efficient architecture search in a space of
finite width CE that aims to either minimize capacity, or maximize trainability
under a set of constraints. The resulting ensembles can be efficiently
implemented in practical architectures using group convolutions and block
diagonal layers. Finally, we show how our framework can be used to analytically
derive optimal group convolution modules originally found using expensive grid
searches, without having to train a single model.
Related papers
- Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Autoselection of the Ensemble of Convolutional Neural Networks with
Second-Order Cone Programming [0.8029049649310213]
This study proposes a mathematical model which prunes the ensemble of Convolutional Neural Networks (CNN)
The proposed model is tested on CIFAR-10, CIFAR-100 and MNIST data sets.
arXiv Detail & Related papers (2023-02-12T16:18:06Z) - HCE: Improving Performance and Efficiency with Heterogeneously
Compressed Neural Network Ensemble [22.065904428696353]
Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture.
We propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model.
arXiv Detail & Related papers (2023-01-18T21:47:05Z) - Sparsity-guided Network Design for Frame Interpolation [39.828644638174225]
We present a compression-driven network design for frame-based algorithms.
We leverage model pruning through sparsity-inducing optimization to greatly reduce the model size.
We achieve a considerable performance gain with a quarter of the size of the original AdaCoF.
arXiv Detail & Related papers (2022-09-09T23:13:25Z) - Embedded Ensembles: Infinite Width Limit and Operating Regimes [15.940871041126453]
A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network.
We refer to this strategy as Embedded Ensembling (EE), its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles.
arXiv Detail & Related papers (2022-02-24T18:55:41Z) - Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data.
In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.