Related papers: A Unified Theory of Diversity in Ensemble Learning

A Unified Theory of Diversity in Ensemble Learning

URL: http://arxiv.org/abs/2301.03962v3
Date: Wed, 7 Feb 2024 10:11:39 GMT
Title: A Unified Theory of Diversity in Ensemble Learning
Authors: Danny Wood and Tingting Mu and Andrew Webb and Henry Reeve and Mikel Luj\'an and Gavin Brown
Abstract summary: We present a theory of ensemble diversity, explaining the nature of diversity for a wide range of supervised learning scenarios. This challenge has been referred to as the holy grail of ensemble learning, an open research issue for over 30 years.
Score: 4.773356856466191
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a theory of ensemble diversity, explaining the nature of diversity for a wide range of supervised learning scenarios. This challenge has been referred to as the holy grail of ensemble learning, an open research issue for over 30 years. Our framework reveals that diversity is in fact a hidden dimension in the bias-variance decomposition of the ensemble loss. We prove a family of exact bias-variance-diversity decompositions, for a wide range of losses in both regression and classification, e.g., squared, cross-entropy, and Poisson losses. For losses where an additive bias-variance decomposition is not available (e.g., 0/1 loss) we present an alternative approach: quantifying the effects of diversity, which turn out to be dependent on the label distribution. Overall, we argue that diversity is a measure of model fit, in precisely the same sense as bias and variance, but accounting for statistical dependencies between ensemble members. Thus, we should not be maximising diversity as so many works aim to do -- instead, we have a bias/variance/diversity trade-off to manage.

Related papers

Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression [102.24287051757469]
We study self-supervised covariance estimation in deep heteroscedastic regression. We derive an upper bound on the 2-Wasserstein distance between normal distributions. Experiments over a wide range of synthetic and real datasets demonstrate that the proposed 2-Wasserstein bound coupled with pseudo label annotations results in a computationally cheaper yet accurate deep heteroscedastic regression.
arXiv Detail & Related papers (2025-02-14T22:37:11Z)
Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data. Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge. We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z)
Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition [114.96385572118042]
We argue that the variation in test label distributions can be broken down hierarchically into global and local levels. We propose a new MoE strategy, $mathsfDirMixE$, which assigns experts to different Dirichlet meta-distributions of the label distribution. We show that our proposed objective benefits from enhanced generalization by virtue of the variance-based regularization.
arXiv Detail & Related papers (2024-05-13T14:24:56Z)
The Implicit Bias of Heterogeneity towards Invariance: A Study of Multi-Environment Matrix Sensing [9.551225697705199]
This paper studies the implicit bias of Gradient Descent (SGD) over heterogeneous data and shows that the implicit bias drives the model learning towards an invariant solution. Specifically, we theoretically investigate the multi-environment low-rank matrix sensing problem where in each environment, the signal comprises (i) a lower-rank invariant part shared across all environments; and (ii) a significantly varying environment-dependent spurious component. The key insight is, through simply employing the large step size large-batch SGD sequentially in each environment without any explicit regularization, the oscillation caused by heterogeneity can provably prevent model learning spurious signals.
arXiv Detail & Related papers (2024-03-03T07:38:24Z)
Results on Counterfactual Invariance [3.616948583169635]
We show that whilst counterfactual invariance implies conditional independence, conditional independence does not give any implications about the degree or likelihood of satisfying counterfactual invariance. For discrete causal models counterfactually invariant functions are often constrained to be functions of particular variables, or even constant.
arXiv Detail & Related papers (2023-07-17T14:27:32Z)
Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z)
The Double-Edged Sword of Diversity: How Diversity, Conflict, and Psychological Safety Impact Software Teams [6.190511747986327]
Team diversity can be seen as a double-edged sword, bringing cognitive resources to teams at the risk of increased conflict. This study views diversity through the lens of the categorization-elaboration model (CEM) We investigated how diversity in gender, age, role, and cultural background impacts team effectiveness and conflict.
arXiv Detail & Related papers (2023-01-30T14:54:44Z)
Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization? [90.35044668396591]
A recurring theme in machine learning is algorithmic monoculture: the same systems, or systems that share components, are deployed by multiple decision-makers. We propose the component-sharing hypothesis: if decision-makers share components like training data or specific models, then they will produce more homogeneous outcomes. We test this hypothesis on algorithmic fairness benchmarks, demonstrating that sharing training data reliably exacerbates homogenization. We conclude with philosophical analyses of and societal challenges for outcome homogenization, with an eye towards implications for deployed machine learning systems.
arXiv Detail & Related papers (2022-11-25T09:33:11Z)
Diverse Weight Averaging for Out-of-Distribution Generalization [100.22155775568761]
We propose Diverse Weight Averaging (DiWA) to average weights obtained from several independent training runs rather than from a single run. DiWA consistently improves the state of the art on the competitive DomainBed benchmark without inference overhead.
arXiv Detail & Related papers (2022-05-19T17:44:22Z)
Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization [77.24152933825238]
We show that for linear classification tasks we need stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.
arXiv Detail & Related papers (2021-06-11T20:42:27Z)
Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games [44.30509625560908]
In open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. We propose a unified measure of diversity in multi-agent open-ended learning based on both Behavioral Diversity (BD) and Response Diversity (RD) We show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.
arXiv Detail & Related papers (2021-06-09T10:11:06Z)
Understanding Generalization in Adversarial Training via the Bias-Variance Decomposition [39.108491135488286]
We decompose the test risk into its bias and variance components. We find that the bias increases monotonically with perturbation size and is the dominant term in the risk. We show that popular explanations for the generalization gap instead predict the variance to be monotonic.
arXiv Detail & Related papers (2021-03-17T23:30:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.