Related papers: A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

URL: http://arxiv.org/abs/2204.03719v2
Date: Tue, 18 Jul 2023 15:28:39 GMT
Title: A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework
Authors: Gabriel Aguiar, Bartosz Krawczyk, Alberto Cano
Abstract summary: Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. This work proposes a standardized, exhaustive, and comprehensive experimental framework to evaluate algorithms.
Score: 12.856833690265985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures and benchmarks on how to evaluate these algorithms. This work proposes a standardized, exhaustive, and comprehensive experimental framework to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to a large-scale experimental study comparing state-of-the-art classifiers in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental framework is fully reproducible and easy to extend with new methods. This way, we propose a standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create complete, trustworthy, and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.

Related papers

Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory [0.0]
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. We propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates. We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations.
arXiv Detail & Related papers (2023-08-05T23:08:08Z)
Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph. A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task. New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z)
Revisiting Long-tailed Image Classification: Survey and Benchmarks with New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution. Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z)
Class-Imbalanced Complementary-Label Learning via Weighted Loss [8.934943507699131]
Complementary-label learning (CLL) is widely used in weakly supervised classification. It faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples. We propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification.
arXiv Detail & Related papers (2022-09-28T16:02:42Z)
A Novel Hybrid Sampling Framework for Imbalanced Learning [0.0]
"SMOTE-RUS-NC" has been compared with other state-of-the-art sampling techniques. Rigorous experimentation has been conducted on 26 imbalanced datasets.
arXiv Detail & Related papers (2022-08-20T07:04:00Z)
A Hybrid Approach for Binary Classification of Imbalanced Data [0.0]
We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning. We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC.
arXiv Detail & Related papers (2022-07-06T15:18:41Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge. We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)
Handling Imbalanced Data: A Case Study for Binary Class Problems [0.0]
The major issues in terms of solving for classification problems are the issues of Imbalanced data. This paper focuses on both synthetic oversampling techniques and manually computes synthetic data points to enhance easy comprehension of the algorithms. We analyze the application of these synthetic oversampling techniques on binary classification problems with different Imbalanced ratios and sample sizes.
arXiv Detail & Related papers (2020-10-09T02:04:14Z)
Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers. Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.