Related papers: Rethinking Fano's Inequality in Ensemble Learning

Rethinking Fano's Inequality in Ensemble Learning

URL: http://arxiv.org/abs/2205.12683v2
Date: Thu, 16 Nov 2023 09:43:51 GMT
Title: Rethinking Fano's Inequality in Ensemble Learning
Authors: Terufumi Morishita, Gaku Morio, Shota Horiguchi, Hiroaki Ozaki, Nobuo Nukaga
Abstract summary: We argue that studies did not take into account the information lost when multiple model predictions are combined into a final prediction. We empirically validate and demonstrate the proposed theory through extensive experiments on actual systems.
Score: 17.948799609068214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a fundamental theory on ensemble learning that answers the central question: what factors make an ensemble system good or bad? Previous studies used a variant of Fano's inequality of information theory and derived a lower bound of the classification error rate on the basis of the $\textit{accuracy}$ and $\textit{diversity}$ of models. We revisit the original Fano's inequality and argue that the studies did not take into account the information lost when multiple model predictions are combined into a final prediction. To address this issue, we generalize the previous theory to incorporate the information loss, which we name $\textit{combination loss}$. Further, we empirically validate and demonstrate the proposed theory through extensive experiments on actual systems. The theory reveals the strengths and weaknesses of systems on each metric, which will push the theoretical understanding of ensemble learning and give us insights into designing systems.

Related papers

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data [35.03888101803088]
This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees. We devise novel and general learning algorithms, IMMAX, which incorporate confidence margins and are applicable to various hypothesis sets.
arXiv Detail & Related papers (2025-02-14T18:57:16Z)
Of Dice and Games: A Theory of Generalized Boosting [61.752303337418475]
We extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses. We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees. Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.
arXiv Detail & Related papers (2024-12-11T01:38:32Z)
An Effective Theory of Bias Amplification [18.648588509429167]
Machine learning models may capture and amplify biases present in data, leading to disparate test performance across social groups. We propose a precise analytical theory in the context of ridge regression, where the former models neural networks in a simplified regime. Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias.
arXiv Detail & Related papers (2024-10-07T08:43:22Z)
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning. We show that systematically controlled metrics are strongly predictive of generalization performance. This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z)
It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level. We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z)
On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z)
Do PAC-Learners Learn the Marginal Distribution? [19.54058590042626]
The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $textituniform convergence$ of empirical error in $H$.<n>This work revisits the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting.
arXiv Detail & Related papers (2023-02-13T11:42:58Z)
Beyond spectral gap (extended): The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. Current theory does not explain that collaboration enables larger learning rates than training alone. This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z)
A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class. We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z)
Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z)
Rate-Distortion Theoretic Generalization Bounds for Stochastic Learning Algorithms [12.020634332110147]
We prove novel generalization bounds through the lens of rate-distortion theory. Our results bring a more unified perspective on generalization and open up several future research directions.
arXiv Detail & Related papers (2022-03-04T18:12:31Z)
Understanding Square Loss in Training Overparametrized Neural Network Classifiers [31.319145959402462]
We contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks. We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error. The resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness.
arXiv Detail & Related papers (2021-12-07T12:12:30Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Blocked and Hierarchical Disentangled Representation From Information Theory Perspective [0.6875312133832078]
We propose a blocked and hierarchical variational autoencoder (BHiVAE) to get better-disentangled representation. BHiVAE mainly comes from the information bottleneck theory and information principle. It exhibits excellent disentanglement results in experiments and superior classification accuracy in representation learning.
arXiv Detail & Related papers (2021-01-21T02:33:55Z)
Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions. We make robust and efficient counterfactual predictions for both individual and average treatment effects. The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)
A Theory of Usable Information Under Computational Constraints [103.5901638681034]
We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory. We show that by incorporating computational constraints, $mathcalV$-information can be reliably estimated from data.
arXiv Detail & Related papers (2020-02-25T06:09:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.