Rethinking Fano's Inequality in Ensemble Learning
- URL: http://arxiv.org/abs/2205.12683v2
- Date: Thu, 16 Nov 2023 09:43:51 GMT
- Title: Rethinking Fano's Inequality in Ensemble Learning
- Authors: Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Nobuo
Nukaga
- Abstract summary: We argue that studies did not take into account the information lost when multiple model predictions are combined into a final prediction.
We empirically validate and demonstrate the proposed theory through extensive experiments on actual systems.
- Score: 17.948799609068214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a fundamental theory on ensemble learning that answers the central
question: what factors make an ensemble system good or bad? Previous studies
used a variant of Fano's inequality of information theory and derived a lower
bound of the classification error rate on the basis of the $\textit{accuracy}$
and $\textit{diversity}$ of models. We revisit the original Fano's inequality
and argue that the studies did not take into account the information lost when
multiple model predictions are combined into a final prediction. To address
this issue, we generalize the previous theory to incorporate the information
loss, which we name $\textit{combination loss}$. Further, we empirically
validate and demonstrate the proposed theory through extensive experiments on
actual systems. The theory reveals the strengths and weaknesses of systems on
each metric, which will push the theoretical understanding of ensemble learning
and give us insights into designing systems.
Related papers
- Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data [35.03888101803088]
This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification.
We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees.
We devise novel and general learning algorithms, IMMAX, which incorporate confidence margins and are applicable to various hypothesis sets.
arXiv Detail & Related papers (2025-02-14T18:57:16Z) - Of Dice and Games: A Theory of Generalized Boosting [61.752303337418475]
We extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses.
We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees.
Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.
arXiv Detail & Related papers (2024-12-11T01:38:32Z) - An Effective Theory of Bias Amplification [18.648588509429167]
Machine learning models may capture and amplify biases present in data, leading to disparate test performance across social groups.
We propose a precise analytical theory in the context of ridge regression, where the former models neural networks in a simplified regime.
Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias.
arXiv Detail & Related papers (2024-10-07T08:43:22Z) - Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning.
We show that systematically controlled metrics are strongly predictive of generalization performance.
This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z) - It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - Beyond spectral gap (extended): The role of the topology in
decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
Current theory does not explain that collaboration enables larger learning rates than training alone.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z) - A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z) - Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution.
Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z) - Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning
Algorithms [12.020634332110147]
We prove novel generalization bounds through the lens of rate-distortion theory.
Our results bring a more unified perspective on generalization and open up several future research directions.
arXiv Detail & Related papers (2022-03-04T18:12:31Z) - Understanding Square Loss in Training Overparametrized Neural Network
Classifiers [31.319145959402462]
We contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks.
We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error.
The resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness.
arXiv Detail & Related papers (2021-12-07T12:12:30Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Blocked and Hierarchical Disentangled Representation From Information
Theory Perspective [0.6875312133832078]
We propose a blocked and hierarchical variational autoencoder (BHiVAE) to get better-disentangled representation.
BHiVAE mainly comes from the information bottleneck theory and information principle.
It exhibits excellent disentanglement results in experiments and superior classification accuracy in representation learning.
arXiv Detail & Related papers (2021-01-21T02:33:55Z) - A Theory of Usable Information Under Computational Constraints [103.5901638681034]
We propose a new framework for reasoning about information in complex systems.
Our foundation is based on a variational extension of Shannon's information theory.
We show that by incorporating computational constraints, $mathcalV$-information can be reliably estimated from data.
arXiv Detail & Related papers (2020-02-25T06:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.