Related papers: Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning

Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning

URL: http://arxiv.org/abs/2208.12433v1
Date: Fri, 26 Aug 2022 04:28:01 GMT
Title: Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning
Authors: Daochen Zha, Kwei-Herng Lai, Qiaoyu Tan, Sirui Ding, Na Zou, Xia Hu
Abstract summary: Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
Score: 57.163525407022966
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. While numerous over-sampling algorithms have been proposed, they heavily rely on heuristics, which could be sub-optimal since we may need different sampling strategies for different datasets and base classifiers, and they cannot directly optimize the performance metric. Motivated by this, we investigate developing a learning-based over-sampling algorithm to optimize the classification performance, which is a challenging task because of the huge and hierarchical decision space. At the high level, we need to decide how many synthetic samples to generate. At the low level, we need to determine where the synthetic samples should be located, which depends on the high-level decision since the optimal locations of the samples may differ for different numbers of samples. To address the challenges, we propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions. Motivated by the success of SMOTE~\cite{chawla2002smote} and its extensions, we formulate the generation process as a Markov decision process (MDP) consisting of three levels of policies to generate synthetic samples within the SMOTE search space. Then we leverage deep hierarchical reinforcement learning to optimize the performance metric on the validation data. Extensive experiments on six real-world datasets demonstrate that AutoSMOTE significantly outperforms the state-of-the-art resampling algorithms. The code is at https://github.com/daochenzha/autosmote

Related papers

Deep Learning Meets Oversampling: A Learning Framework to Handle Imbalanced Classification [0.0]
We propose a novel learning framework that can generate synthetic data instances in a data-driven manner. The proposed framework formulates the oversampling process as a composition of discrete decision criteria. Experiments on the imbalanced classification task demonstrate the superiority of our framework over state-of-the-art algorithms.
arXiv Detail & Related papers (2025-02-08T13:35:00Z)
AEMLO: AutoEncoder-Guided Multi-Label Oversampling [6.255095509216069]
AEMLO is an AutoEncoder-guided Oversampling technique for imbalanced multi-label data. We show that AEMLO outperforms the existing state-of-the-art methods with extensive empirical studies.
arXiv Detail & Related papers (2024-08-23T14:01:33Z)
Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions. We propose a novel active learning based adaptive sampling strategy to optimize the sample selection. Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z)
Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z)
Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining. Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z)
Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback [36.05851452151107]
federated learning (FL) systems need to sample a subset of clients that are involved in each round of training. Despite its importance, there is limited work on how to sample clients effectively. We show how our sampling method can improve the convergence speed of optimization algorithms.
arXiv Detail & Related papers (2021-12-28T23:50:52Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms. We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.