Error Distribution Smoothing:Advancing Low-Dimensional Imbalanced Regression
- URL: http://arxiv.org/abs/2502.02277v1
- Date: Tue, 04 Feb 2025 12:40:07 GMT
- Title: Error Distribution Smoothing:Advancing Low-Dimensional Imbalanced Regression
- Authors: Donghe Chen, Jiaxuan Yue, Tengjie Zheng, Lanxuan Wang, Lin Cheng,
- Abstract summary: In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas.
We introduce a novel concept of Imbalanced Regression, which takes into account both the complexity of the problem and the density of data points, extending beyond traditional definitions that focus only on data density.
We propose Error Distribution Smoothing (EDS) as a solution to tackle imbalanced regression, effectively selecting a representative subset from the dataset to reduce redundancy while maintaining balance and representativeness.
- Score: 2.435853975142516
- License:
- Abstract: In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas. This imbalance presents significant challenges for existing classification methods with clear class boundaries, while highlighting a scarcity of approaches specifically designed for imbalanced regression problems. To better address these issues, we introduce a novel concept of Imbalanced Regression, which takes into account both the complexity of the problem and the density of data points, extending beyond traditional definitions that focus only on data density. Furthermore, we propose Error Distribution Smoothing (EDS) as a solution to tackle imbalanced regression, effectively selecting a representative subset from the dataset to reduce redundancy while maintaining balance and representativeness. Through several experiments, EDS has shown its effectiveness, and the related code and dataset can be accessed at https://anonymous.4open.science/r/Error-Distribution-Smoothing-762F.
Related papers
- Histogram approaches for imbalanced data streams regression [1.8385275253826225]
We introduce novel data-level sampling strategies, textttHistUS and textttHistOS, that utilize histogram-based approaches to balance data streams.
We demonstrate that textttHistUS and textttHistOS outperform traditional methods through extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2025-01-29T11:03:02Z) - Data Augmentation with Variational Autoencoder for Imbalanced Dataset [1.2289361708127877]
Learning from an imbalanced distribution presents a major challenge in predictive modeling.
We develop a novel approach for generating data, combining VAE with a smoothed bootstrap, specifically designed to address the challenges of IR.
arXiv Detail & Related papers (2024-12-09T22:59:03Z) - Stability and Generalizability in SDE Diffusion Models with Measure-Preserving Dynamics [11.919291977879801]
Inverse problems describe the process of estimating the causal factors from a set of measurements or data.
Diffusion models have shown promise as potent generative tools for solving inverse problems.
arXiv Detail & Related papers (2024-06-19T15:55:12Z) - P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced
Clustering [16.723646401890495]
We propose a novel pseudo-labeling-based learning framework for deep clustering.
Our framework generates imbalance-aware pseudo-labels and learning from high-confident samples.
Experiments on various datasets, including a human-curated long-tailed CIFAR100, demonstrate the superiority of our method.
arXiv Detail & Related papers (2024-01-17T15:15:46Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Supercharging Imbalanced Data Learning With Energy-based Contrastive
Representation Transfer [72.5190560787569]
In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets.
Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions.
This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes.
arXiv Detail & Related papers (2020-11-25T00:13:11Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Heteroskedastic and Imbalanced Deep Learning with Adaptive
Regularization [55.278153228758434]
Real-world datasets are heteroskedastic and imbalanced.
Addressing heteroskedasticity and imbalance simultaneously is under-explored.
We propose a data-dependent regularization technique for heteroskedastic datasets.
arXiv Detail & Related papers (2020-06-29T01:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.