Inner Ensemble Networks: Average Ensemble as an Effective Regularizer
- URL: http://arxiv.org/abs/2006.08305v2
- Date: Fri, 9 Oct 2020 05:59:00 GMT
- Title: Inner Ensemble Networks: Average Ensemble as an Effective Regularizer
- Authors: Abduallah Mohamed, Muhammed Mohaimin Sadiq, Ehab AlBadawy, Mohamed
Elhoseiny, Christian Claudel
- Abstract summary: Inner Ensemble Networks (IENs) reduce the variance within the neural network itself without an increase in the model complexity.
IENs utilize ensemble parameters during the training phase to reduce the network variance.
- Score: 20.33062212014075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Inner Ensemble Networks (IENs) which reduce the variance within
the neural network itself without an increase in the model complexity. IENs
utilize ensemble parameters during the training phase to reduce the network
variance. While in the testing phase, these parameters are removed without a
change in the enhanced performance. IENs reduce the variance of an ordinary
deep model by a factor of $1/m^{L-1}$, where $m$ is the number of inner
ensembles and $L$ is the depth of the model. Also, we show empirically and
theoretically that IENs lead to a greater variance reduction in comparison with
other similar approaches such as dropout and maxout. Our results show a
decrease of error rates between 1.7\% and 17.3\% in comparison with an ordinary
deep model. We also show that IEN was preferred by Neural Architecture Search
(NAS) methods over prior approaches. Code is available at
https://github.com/abduallahmohamed/inner_ensemble_nets.
Related papers
- Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis [17.989809995141044]
We propose CCA Merge, which is based on Corre Analysis Analysis.
We show that CCA works significantly better than past methods when more than 2 models are merged.
arXiv Detail & Related papers (2024-07-07T14:21:04Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.
We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.
Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - On the Convergence of Federated Averaging under Partial Participation for Over-parameterized Neural Networks [13.2844023993979]
Federated learning (FL) is a widely employed distributed paradigm for collaboratively machine learning models from multiple clients without sharing local data.
In this paper, we show that FedAvg converges to a global minimum at a global rate at a global focus.
arXiv Detail & Related papers (2023-10-09T07:56:56Z) - More Communication Does Not Result in Smaller Generalization Error in
Federated Learning [9.00236182523638]
We study the generalization error of statistical learning models in a Federated Learning setting.
We consider multiple (say $R in mathbb N*$) rounds of model aggregation and study the effect of $R$ on the generalization error of the final aggregated model.
arXiv Detail & Related papers (2023-04-24T15:56:11Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Bounding the Width of Neural Networks via Coupled Initialization -- A
Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks.
We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z) - Model Architecture Adaption for Bayesian Neural Networks [9.978961706999833]
We show a novel network architecture search (NAS) that optimize BNNs for both accuracy and uncertainty.
In our experiments, the searched models show comparable uncertainty ability and accuracy compared to the state-of-the-art (deep ensemble)
arXiv Detail & Related papers (2022-02-09T10:58:50Z) - R-Drop: Regularized Dropout for Neural Networks [99.42791938544012]
Dropout is a powerful and widely used technique to regularize the training of deep neural networks.
We introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models to be consistent with each other.
arXiv Detail & Related papers (2021-06-28T08:01:26Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.