Related papers: Ensemble Performance Through the Lens of Linear Independence of Classifier Votes in Data Streams

Ensemble Performance Through the Lens of Linear Independence of Classifier Votes in Data Streams

URL: http://arxiv.org/abs/2511.21465v1
Date: Wed, 26 Nov 2025 14:57:59 GMT
Title: Ensemble Performance Through the Lens of Linear Independence of Classifier Votes in Data Streams
Authors: Enes Bektas, Fazli Can,
Abstract summary: This paper investigates the relationship between ensemble size and performance through the lens of linear independence among classifier votes in data streams.<n>By modeling the probability of achieving linear independence among classifier outputs, we derive a theoretical framework that explains the trade-off between ensemble size and accuracy.<n>Our results confirm that this theoretical estimate effectively identifies the point of performance saturation for robust ensembles like OzaBagging.
Score: 2.105564340986074
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensemble learning improves classification performance by combining multiple base classifiers. While increasing the number of classifiers generally enhances accuracy, excessively large ensembles can lead to computational inefficiency and diminishing returns. This paper investigates the relationship between ensemble size and performance through the lens of linear independence among classifier votes in data streams. We propose that ensembles composed of linearly independent classifiers maximize representational capacity, particularly under a geometric model. We then generalize the importance of linear independence to the weighted majority voting problem. By modeling the probability of achieving linear independence among classifier outputs, we derive a theoretical framework that explains the trade-off between ensemble size and accuracy. Our analysis leads to a theoretical estimate of the ensemble size required to achieve a user-specified probability of linear independence. We validate our theory through experiments on both real-world and synthetic datasets using two ensemble methods, OzaBagging and GOOWE. Our results confirm that this theoretical estimate effectively identifies the point of performance saturation for robust ensembles like OzaBagging. Conversely, for complex weighting schemes like GOOWE, our framework reveals that high theoretical diversity can trigger algorithmic instability. Our implementation is publicly available to support reproducibility and future research.

Related papers

Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models [55.94503936470247]
Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including judges.<n>Most classical methods assume annotators are conditionally independent given the true label $Yin0,1$, an assumption often violated by LLM judges.<n>We study label aggregation through a hierarchy of dependence-aware models based on Ising graphical models and latent factors.
arXiv Detail & Related papers (2026-01-29T21:26:50Z)
Making Foundation Models Probabilistic via Singular Value Ensembles [56.4174499669573]
Foundation models have become a dominant paradigm in machine learning, achieving remarkable performance across diverse tasks through large-scale pretraining.<n>Standard approach to quantifying uncertainty, training an ensemble of independent models, incurs prohibitive computational costs that scale linearly with ensemble size.<n>We propose Singular Value Ensemble (SVE), a parameter-efficient implicit ensemble method that builds on a simple, but powerful core assumption.<n>We show that SVE uncertainty quantification achieves comparable to explicit deep ensembles while increasing the parameter count of the base model by less than 1%.
arXiv Detail & Related papers (2026-01-29T18:07:18Z)
Learning Causal Response Representations through Direct Effect Analysis [3.881388090216841]
We propose a novel approach for learning causal response representations.<n>Our method aims to extract directions in which a multidimensional outcome is most directly caused by a treatment variable.
arXiv Detail & Related papers (2025-03-06T12:01:41Z)
Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions. We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z)
Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model. Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z)
Obtaining Explainable Classification Models using Distributionally Robust Optimization [12.511155426574563]
We study generalized linear models constructed using sets of feature value rules. An inherent trade-off exists between rule set sparsity and its prediction accuracy. We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors.
arXiv Detail & Related papers (2023-11-03T15:45:34Z)
Leveraging Linear Independence of Component Classifiers: Optimizing Size and Prediction Accuracy for Online Ensembles [3.97048491084787]
We introduce a novel perspective, rooted in the linear independence of classifier's votes, to analyze the interplay between ensemble size and prediction accuracy. We present a method to determine the minimum ensemble size required to ensure a target probability of linearly independent votes. Surprisingly, the calculated ideal ensemble size deviates from empirical results for certain datasets, emphasizing the influence of other factors.
arXiv Detail & Related papers (2023-08-27T18:38:09Z)
Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals. Model-to-Match uses variable importance measurements to construct a distance metric. We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z)
BELIEF in Dependence: Leveraging Atomic Linearity in Data Bits for Rethinking Generalized Linear Models [5.726186905478233]
We develop a framework called binary expansion linear effect (BELIEF) for understanding arbitrary relationships with a binary outcome.<n>Models from the BELIEF framework are easily interpretable because they describe the association of binary variables in the language of linear models.
arXiv Detail & Related papers (2022-10-19T19:28:09Z)
A Dataset-Level Geometric Framework for Ensemble Classifiers [0.76146285961466]
Majority voting and weighted majority voting are two commonly used combination schemes in ensemble learning. We present a group of properties of these two combination schemes formally under a dataset-level geometric framework.
arXiv Detail & Related papers (2021-06-16T09:48:12Z)
Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management. We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
New advances in enumerative biclustering algorithms with online partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.