A Dataset-Level Geometric Framework for Ensemble Classifiers
- URL: http://arxiv.org/abs/2106.08658v1
- Date: Wed, 16 Jun 2021 09:48:12 GMT
- Title: A Dataset-Level Geometric Framework for Ensemble Classifiers
- Authors: Shengli Wu, Weimin Ding
- Abstract summary: Majority voting and weighted majority voting are two commonly used combination schemes in ensemble learning.
We present a group of properties of these two combination schemes formally under a dataset-level geometric framework.
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensemble classifiers have been investigated by many in the artificial
intelligence and machine learning community. Majority voting and weighted
majority voting are two commonly used combination schemes in ensemble learning.
However, understanding of them is incomplete at best, with some properties even
misunderstood. In this paper, we present a group of properties of these two
schemes formally under a dataset-level geometric framework. Two key factors,
every component base classifier's performance and dissimilarity between each
pair of component classifiers are evaluated by the same metric - the Euclidean
distance. Consequently, ensembling becomes a deterministic problem and the
performance of an ensemble can be calculated directly by a formula. We prove
several theorems of interest and explain their implications for ensembles. In
particular, we compare and contrast the effect of the number of component
classifiers on these two types of ensemble schemes. Empirical investigation is
also conducted to verify the theoretical results when other metrics such as
accuracy are used. We believe that the results from this paper are very useful
for us to understand the fundamental properties of these two combination
schemes and the principles of ensemble classifiers in general. The results are
also helpful for us to investigate some issues in ensemble classifiers, such as
ensemble performance prediction, selecting a small number of base classifiers
to obtain efficient and effective ensembles.
Related papers
- Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Relation-aware Ensemble Learning for Knowledge Graph Embedding [68.94900786314666]
We propose to learn an ensemble by leveraging existing methods in a relation-aware manner.
exploring these semantics using relation-aware ensemble leads to a much larger search space than general ensemble methods.
We propose a divide-search-combine algorithm RelEns-DSC that searches the relation-wise ensemble weights independently.
arXiv Detail & Related papers (2023-10-13T07:40:12Z) - Leveraging Linear Independence of Component Classifiers: Optimizing Size
and Prediction Accuracy for Online Ensembles [3.97048491084787]
We introduce a novel perspective, rooted in the linear independence of classifier's votes, to analyze the interplay between ensemble size and prediction accuracy.
We present a method to determine the minimum ensemble size required to ensure a target probability of linearly independent votes.
Surprisingly, the calculated ideal ensemble size deviates from empirical results for certain datasets, emphasizing the influence of other factors.
arXiv Detail & Related papers (2023-08-27T18:38:09Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Specialists Outperform Generalists in Ensemble Classification [15.315432841707736]
In this paper, we address the question of whether we can determine the accuracy of the ensemble.
We explicitly construct the individual classifiers that attain the upper and lower bounds: specialists and generalists.
arXiv Detail & Related papers (2021-07-09T12:16:10Z) - HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis [2.5329716878122404]
Comprehensive benchmarking of clustering algorithms is difficult.
There is no consensus regarding the best practice for rigorous benchmarking.
We demonstrate the important role evolutionary algorithms play to support flexible generation of such benchmarks.
arXiv Detail & Related papers (2021-02-13T15:01:34Z) - Linear Classifier Combination via Multiple Potential Functions [0.6091702876917279]
We propose a novel concept of calculating a scoring function based on the distance of the object from the decision boundary and its distance to the class centroid.
An important property is that the proposed score function has the same nature for all linear base classifiers.
arXiv Detail & Related papers (2020-10-02T08:11:51Z) - Ensemble of Binary Classifiers Combined Using Recurrent Correlation
Associative Memories [1.3706331473063877]
The majority vote is an example of a methodology used to combine classifiers in an ensemble method.
We introduce ensemble methods based on recurrent correlation associative memories for binary classification problems.
arXiv Detail & Related papers (2020-09-18T01:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.