On-the-Fly Ensemble Pruning in Evolving Data Streams
- URL: http://arxiv.org/abs/2109.07611v1
- Date: Wed, 15 Sep 2021 22:54:22 GMT
- Title: On-the-Fly Ensemble Pruning in Evolving Data Streams
- Authors: Sanem Elbasi, Alican B\"uy\"uk\c{c}ak{\i}r, Hamed Bonab and Fazli Can
- Abstract summary: CCRP is an on-the-fly ensemble prun-ing method for multi-class data stream classification.
We show that different types of en-sembles that integrate CCRP consistently yield on par or superior performance with 20% to 90% less averagememory consumption.
- Score: 4.137914981603379
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensemble pruning is the process of selecting a subset of componentclassifiers
from an ensemble which performs at least as well as theoriginal ensemble while
reducing storage and computational costs.Ensemble pruning in data streams is a
largely unexplored area ofresearch. It requires analysis of ensemble components
as they arerunning on the stream, and differentiation of useful classifiers
fromredundant ones. We present CCRP, an on-the-fly ensemble prun-ing method for
multi-class data stream classification empoweredby an imbalance-aware fusion of
class-wise component rankings.CCRP aims that the resulting pruned ensemble
contains the bestperforming classifier for each target class and hence, reduces
the ef-fects of class imbalance. The conducted experiments on real-worldand
synthetic data streams demonstrate that different types of en-sembles that
integrate CCRP as their pruning scheme consistentlyyield on par or superior
performance with 20% to 90% less averagememory consumption. Lastly, we validate
the proposed pruningscheme by comparing our approach against pruning schemes
basedon ensemble weights and basic rank fusion methods.
Related papers
- A binary PSO based ensemble under-sampling model for rebalancing imbalanced training data [29.53148709356689]
In this paper, a novel ensemble method combining the advantages of both ensemble learning for biasing classifiers and a new under-sampling method is proposed.
The under-sampling method is named Binary PSO instance selection; it gathers with ensemble classifiers to find the most suitable length and combination of the majority class samples.
According to experimental results, our proposed methods outperform single ensemble methods, state-of-the-art under-sampling methods, and also combinations of these methods with the traditional PSO instance selection algorithm.
arXiv Detail & Related papers (2025-01-31T01:45:20Z) - A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - Balanced Classification: A Unified Framework for Long-Tailed Object
Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories.
We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution.
BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv Detail & Related papers (2023-08-04T09:11:07Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Overlapping oriented imbalanced ensemble learning method based on
projective clustering and stagewise hybrid sampling [22.32930261633615]
This paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS)
The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples.
arXiv Detail & Related papers (2022-11-30T01:49:06Z) - Ensemble Classifier Design Tuned to Dataset Characteristics for Network
Intrusion Detection [0.0]
Two new algorithms are proposed to address the class overlap issue in the dataset.
The proposed design is evaluated for both binary and multi-category classification.
arXiv Detail & Related papers (2022-05-08T21:06:42Z) - Hybrid Ensemble optimized algorithm based on Genetic Programming for
imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification.
Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z) - SetConv: A New Approach for Learning from Imbalanced Data [29.366843553056594]
We propose a set convolution operation and an episodic training strategy to extract a single representative for each class.
We prove that our proposed algorithm is permutation-invariant despite the order of inputs.
arXiv Detail & Related papers (2021-04-03T22:33:30Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.