Handling Imbalanced Datasets Through Optimum-Path Forest
- URL: http://arxiv.org/abs/2202.08934v1
- Date: Thu, 17 Feb 2022 23:24:49 GMT
- Title: Handling Imbalanced Datasets Through Optimum-Path Forest
- Authors: Leandro Aparecido Passos, Danilo S. Jodas, Luiz C. F. Ribeiro, Marco
Akio, Andre Nunes de Souza, Jo\~ao Paulo Papa
- Abstract summary: Optimum-Path Forest (OPF) has attracted considerable notoriety due to the outstanding performance over many applications.
We propose three OPF-based strategies to deal with the imbalance problem: the $textO2$PF and the OPF-US.
Results compared against several state-of-the-art techniques over public and private datasets confirm the robustness of the proposed approaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last decade, machine learning-based approaches became capable of
performing a wide range of complex tasks sometimes better than humans,
demanding a fraction of the time. Such an advance is partially due to the
exponential growth in the amount of data available, which makes it possible to
extract trustworthy real-world information from them. However, such data is
generally imbalanced since some phenomena are more likely than others. Such a
behavior yields considerable influence on the machine learning model's
performance since it becomes biased on the more frequent data it receives.
Despite the considerable amount of machine learning methods, a graph-based
approach has attracted considerable notoriety due to the outstanding
performance over many applications, i.e., the Optimum-Path Forest (OPF). In
this paper, we propose three OPF-based strategies to deal with the imbalance
problem: the $\text{O}^2$PF and the OPF-US, which are novel approaches for
oversampling and undersampling, respectively, as well as a hybrid strategy
combining both approaches. The paper also introduces a set of variants
concerning the strategies mentioned above. Results compared against several
state-of-the-art techniques over public and private datasets confirm the
robustness of the proposed approaches.
Related papers
- First-Order Manifold Data Augmentation for Regression Learning [4.910937238451485]
We introduce FOMA: a new data-driven domain-independent data augmentation method.
We evaluate FOMA on in-distribution generalization and out-of-distribution benchmarks, and we show that it improves the generalization of several neural architectures.
arXiv Detail & Related papers (2024-06-16T12:35:05Z) - AAA: an Adaptive Mechanism for Locally Differential Private Mean Estimation [42.95927712062214]
Local differential privacy (LDP) is a strong privacy standard that has been adopted by popular software systems.
We propose the advanced adaptive additive (AAA) mechanism, which is a distribution-aware approach that addresses the average utility.
We provide rigorous privacy proofs, utility analyses, and extensive experiments comparing AAA with state-of-the-art mechanisms.
arXiv Detail & Related papers (2024-04-02T04:22:07Z) - Efficient Hybrid Oversampling and Intelligent Undersampling for
Imbalanced Big Data Classification [1.03590082373586]
We present a novel resampling method called SMOTENN that combines intelligent undersampling and oversampling using a MapReduce framework.
Our experimental results show the virtues of this approach, outperforming alternative resampling techniques for small- and medium-sized datasets.
arXiv Detail & Related papers (2023-10-09T15:22:13Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Learning Distributionally Robust Models at Scale via Composite
Optimization [45.47760229170775]
We show how different variants of DRO are simply instances of a finite-sum composite optimization for which we provide scalable methods.
We also provide empirical results that demonstrate the effectiveness of our proposed algorithm with respect to the prior art in order to learn robust models from very large datasets.
arXiv Detail & Related papers (2022-03-17T20:47:42Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.