Embedded Ensembles: Infinite Width Limit and Operating Regimes
- URL: http://arxiv.org/abs/2202.12297v1
- Date: Thu, 24 Feb 2022 18:55:41 GMT
- Title: Embedded Ensembles: Infinite Width Limit and Operating Regimes
- Authors: Maksim Velikanov, Roman Kail, Ivan Anokhin, Roman Vashurin, Maxim
Panov, Alexey Zaytsev, Dmitry Yarotsky
- Abstract summary: A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network.
We refer to this strategy as Embedded Ensembling (EE), its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles.
- Score: 15.940871041126453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A memory efficient approach to ensembling neural networks is to share most
weights among the ensembled models by means of a single reference network. We
refer to this strategy as Embedded Ensembling (EE); its particular examples are
BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a
systematic theoretical and empirical analysis of embedded ensembles with
different number of models. Theoretically, we use a Neural-Tangent-Kernel-based
approach to derive the wide network limit of the gradient descent dynamics. In
this limit, we identify two ensemble regimes - independent and collective -
depending on the architecture and initialization strategy of ensemble models.
We prove that in the independent regime the embedded ensemble behaves as an
ensemble of independent models. We confirm our theoretical prediction with a
wide range of experiments with finite networks, and further study empirically
various effects such as transition between the two regimes, scaling of ensemble
performance with the network width and number of models, and dependence of
performance on a number of architecture and hyperparameter choices.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Exploiting Temporal Structures of Cyclostationary Signals for
Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS)
We focus on cyclostationary signals, which are particularly suitable in a variety of application domains.
We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z) - A Coupled CP Decomposition for Principal Components Analysis of
Symmetric Networks [11.988825533369686]
We propose a principal components analysis (PCA) framework for sequence network data.
We derive efficient algorithms for computing our proposed "Coupled CP" decomposition.
We demonstrate the effectiveness of our proposal on simulated data and on examples from political science and financial economics.
arXiv Detail & Related papers (2022-02-09T20:52:19Z) - Multi-Scale Semantics-Guided Neural Networks for Efficient
Skeleton-Based Human Action Recognition [140.18376685167857]
A simple yet effective multi-scale semantics-guided neural network is proposed for skeleton-based action recognition.
MS-SGN achieves the state-of-the-art performance on the NTU60, NTU120, and SYSU datasets.
arXiv Detail & Related papers (2021-11-07T03:50:50Z) - Connections between Numerical Algorithms for PDEs and Neural Networks [8.660429288575369]
We investigate numerous structural connections between numerical algorithms for partial differential equations (PDEs) and neural networks.
Our goal is to transfer the rich set of mathematical foundations from the world of PDEs to neural networks.
arXiv Detail & Related papers (2021-07-30T16:42:45Z) - Polynomial Networks in Deep Classifiers [55.90321402256631]
We cast the study of deep neural networks under a unifying framework.
Our framework provides insights on the inductive biases of each model.
The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.
arXiv Detail & Related papers (2021-04-16T06:41:20Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Collegial Ensembles [11.64359837358763]
We show that collegial ensembles can be efficiently implemented in practical architectures using group convolutions and block diagonal layers.
We also show how our framework can be used to analytically derive optimal group convolution modules without having to train a single model.
arXiv Detail & Related papers (2020-06-13T16:40:26Z) - Consistency of Spectral Clustering on Hierarchical Stochastic Block
Models [5.983753938303726]
We study the hierarchy of communities in real-world networks under a generic block model.
We prove the strong consistency of this method under a wide range of model parameters.
Unlike most of existing work, our theory covers multiscale networks where the connection probabilities may differ by orders of magnitude.
arXiv Detail & Related papers (2020-04-30T01:08:59Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Semi-Structured Distributional Regression -- Extending Structured
Additive Models by Arbitrary Deep Neural Networks and Data Modalities [0.0]
We propose a general framework to combine structured regression models and deep neural networks into a unifying network architecture.
We demonstrate the framework's efficacy in numerical experiments and illustrate its special merits in benchmarks and real-world applications.
arXiv Detail & Related papers (2020-02-13T21:01:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.