Generalized Negative Correlation Learning for Deep Ensembling
- URL: http://arxiv.org/abs/2011.02952v2
- Date: Wed, 9 Dec 2020 08:19:49 GMT
- Title: Generalized Negative Correlation Learning for Deep Ensembling
- Authors: Sebastian Buschj\"ager, Lukas Pfahler, Katharina Morik
- Abstract summary: Ensemble algorithms offer state of the art performance in many machine learning applications.
We formulate a generalized bias-variance decomposition for arbitrary twice differentiable loss functions.
We derive a Generalized Negative Correlation Learning algorithm which offers explicit control over the ensemble's diversity.
- Score: 7.569288952340753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensemble algorithms offer state of the art performance in many machine
learning applications. A common explanation for their excellent performance is
due to the bias-variance decomposition of the mean squared error which shows
that the algorithm's error can be decomposed into its bias and variance. Both
quantities are often opposed to each other and ensembles offer an effective way
to manage them as they reduce the variance through a diverse set of base
learners while keeping the bias low at the same time. Even though there have
been numerous works on decomposing other loss functions, the exact mathematical
connection is rarely exploited explicitly for ensembling, but merely used as a
guiding principle. In this paper, we formulate a generalized bias-variance
decomposition for arbitrary twice differentiable loss functions and study it in
the context of Deep Learning. We use this decomposition to derive a Generalized
Negative Correlation Learning (GNCL) algorithm which offers explicit control
over the ensemble's diversity and smoothly interpolates between the two
extremes of independent training and the joint training of the ensemble. We
show how GNCL encapsulates many previous works and discuss under which
circumstances training of an ensemble of Neural Networks might fail and what
ensembling method should be favored depending on the choice of the individual
networks. We make our code publicly available under
https://github.com/sbuschjaeger/gncl
Related papers
- Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - HyperInvariances: Amortizing Invariance Learning [10.189246340672245]
Invariance learning is expensive and data intensive for popular neural architectures.
We introduce the notion of amortizing invariance learning.
This framework can identify appropriate invariances in different downstream tasks and lead to comparable or better test performance.
arXiv Detail & Related papers (2022-07-17T21:40:37Z) - Combining Varied Learners for Binary Classification using Stacked
Generalization [3.1871776847712523]
This paper performs binary classification using Stacked Generalization on high dimensional Polycystic Ovary Syndrome dataset.
The various metrics are given in this paper that also point out a subtle transgression found with Receiver Operating Characteristic Curve that was proved to be incorrect.
arXiv Detail & Related papers (2022-02-17T21:47:52Z) - Understanding the Generalization of Adam in Learning Neural Networks
with Proper Regularization [118.50301177912381]
We show that Adam can converge to different solutions of the objective with provably different errors, even with weight decay globalization.
We show that if convex, and the weight decay regularization is employed, any optimization algorithms including Adam will converge to the same solution.
arXiv Detail & Related papers (2021-08-25T17:58:21Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Rethinking and Reweighting the Univariate Losses for Multi-Label
Ranking: Consistency and Generalization [44.73295800450414]
(Partial) ranking loss is a commonly used evaluation measure for multi-label classification.
There is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency.
arXiv Detail & Related papers (2021-05-10T09:23:27Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.