Local distribution-based adaptive oversampling for imbalanced regression
- URL: http://arxiv.org/abs/2504.14316v1
- Date: Sat, 19 Apr 2025 14:36:41 GMT
- Title: Local distribution-based adaptive oversampling for imbalanced regression
- Authors: Shayan Alahyari, Mike Domaratzki,
- Abstract summary: Imbalanced regression occurs when continuous target variables have skewed distributions, creating sparse regions.<n>We propose LDAO (Local Distribution-based Adaptive Oversampling), a novel data-level approach that avoids categorizing individual samples as rare or frequent.<n> LDAO achieves a balanced representation across the entire target range while preserving the inherent statistical structure within each local distribution.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imbalanced regression occurs when continuous target variables have skewed distributions, creating sparse regions that are difficult for machine learning models to predict accurately. This issue particularly affects neural networks, which often struggle with imbalanced data. While class imbalance in classification has been extensively studied, imbalanced regression remains relatively unexplored, with few effective solutions. Existing approaches often rely on arbitrary thresholds to categorize samples as rare or frequent, ignoring the continuous nature of target distributions. These methods can produce synthetic samples that fail to improve model performance and may discard valuable information through undersampling. To address these limitations, we propose LDAO (Local Distribution-based Adaptive Oversampling), a novel data-level approach that avoids categorizing individual samples as rare or frequent. Instead, LDAO learns the global distribution structure by decomposing the dataset into a mixture of local distributions, each preserving its statistical characteristics. LDAO then models and samples from each local distribution independently before merging them into a balanced training set. LDAO achieves a balanced representation across the entire target range while preserving the inherent statistical structure within each local distribution. In extensive evaluations on 45 imbalanced datasets, LDAO outperforms state-of-the-art oversampling methods on both frequent and rare target values, demonstrating its effectiveness for addressing the challenge of imbalanced regression.
Related papers
- SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression [0.0]
Imbalanced regression refers to prediction tasks where the target variable is skewed.
This skewness hinders machine learning models, especially neural networks, which concentrate on dense regions.
We propose SMOGAN, a two-step oversampling framework for imbalanced regression.
arXiv Detail & Related papers (2025-04-29T20:15:25Z) - Error Distribution Smoothing:Advancing Low-Dimensional Imbalanced Regression [2.435853975142516]
In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas.<n>We introduce a novel concept of Imbalanced Regression, which takes into account both the complexity of the problem and the density of data points, extending beyond traditional definitions that focus only on data density.<n>We propose Error Distribution Smoothing (EDS) as a solution to tackle imbalanced regression, effectively selecting a representative subset from the dataset to reduce redundancy while maintaining balance and representativeness.
arXiv Detail & Related papers (2025-02-04T12:40:07Z) - Histogram Approaches for Imbalanced Data Streams Regression [1.8385275253826225]
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression.<n>This study introduces histogram-based sampling strategies to overcome this constraint.<n> Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy.
arXiv Detail & Related papers (2025-01-29T11:03:02Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Integrating Local Real Data with Global Gradient Prototypes for
Classifier Re-Balancing in Federated Long-Tailed Learning [60.41501515192088]
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively.
The data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model.
In this work, we integrate the local real data with the global gradient prototypes to form the local balanced datasets.
arXiv Detail & Related papers (2023-01-25T03:18:10Z) - Global Distance-distributions Separation for Unsupervised Person
Re-identification [93.39253443415392]
Existing unsupervised ReID approaches often fail in correctly identifying the positive samples and negative samples through the distance-based matching/ranking.
We introduce a global distance-distributions separation constraint over the two distributions to encourage the clear separation of positive and negative samples from a global view.
We show that our method leads to significant improvement over the baselines and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2020-06-01T07:05:39Z) - Imbalanced Data Learning by Minority Class Augmentation using Capsule
Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods.
In our model, generative and discriminative networks play a novel competitive game.
The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.