More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression
- URL: http://arxiv.org/abs/2509.23240v1
- Date: Sat, 27 Sep 2025 10:44:28 GMT
- Title: More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression
- Authors: Shayan Alahyari,
- Abstract summary: In many real-world regression tasks, the data distribution is heavily skewed, and models learn predominantly from abundant majority samples.<n>We propose LatentDiff, a novel framework that uses conditional diffusion models with priority-based generation to synthesize high-quality features in the latent representation space.<n>Experiments on three DIR benchmarks demonstrate substantial improvements in minority regions while maintaining overall accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many real-world regression tasks, the data distribution is heavily skewed, and models learn predominantly from abundant majority samples while failing to predict minority labels accurately. While imbalanced classification has been extensively studied, imbalanced regression remains relatively unexplored. Deep imbalanced regression (DIR) represents cases where the input data are high-dimensional and unstructured. Although several data-level approaches for tabular imbalanced regression exist, deep imbalanced regression currently lacks dedicated data-level solutions suitable for high-dimensional data and relies primarily on algorithmic modifications. To fill this gap, we propose LatentDiff, a novel framework that uses conditional diffusion models with priority-based generation to synthesize high-quality features in the latent representation space. LatentDiff is computationally efficient and applicable across diverse data modalities, including images, text, and other high-dimensional inputs. Experiments on three DIR benchmarks demonstrate substantial improvements in minority regions while maintaining overall accuracy.
Related papers
- Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression [1.2289361708127877]
Imbalanced distribution learning is a common and significant challenge in predictive modeling, often reducing the performance of standard algorithms.<n>We propose using Variational Autoencoders (VAEs) to model and define a latent representation of data distributions.<n>To address this, we develop an innovative data generation method that combines a disentangled VAE with a Smoothed Bootstrap applied in the latent space.
arXiv Detail & Related papers (2025-08-19T13:40:04Z) - Error Distribution Smoothing:Advancing Low-Dimensional Imbalanced Regression [2.435853975142516]
In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas.<n>We introduce a novel concept of Imbalanced Regression, which takes into account both the complexity of the problem and the density of data points, extending beyond traditional definitions that focus only on data density.<n>We propose Error Distribution Smoothing (EDS) as a solution to tackle imbalanced regression, effectively selecting a representative subset from the dataset to reduce redundancy while maintaining balance and representativeness.
arXiv Detail & Related papers (2025-02-04T12:40:07Z) - Histogram Approaches for Imbalanced Data Streams Regression [1.8385275253826225]
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression.<n>This study introduces histogram-based sampling strategies to overcome this constraint.<n> Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy.
arXiv Detail & Related papers (2025-01-29T11:03:02Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Understanding Pathologies of Deep Heteroskedastic Regression [25.509884677111344]
Heteroskedastic models predict both mean and residual noise for each data point.
At one extreme, these models fit all training data perfectly, eliminating residual noise entirely.
At the other, they overfit the residual noise while predicting a constant, uninformative mean.
We observe a lack of middle ground, suggesting a phase transition dependent on model regularization strength.
arXiv Detail & Related papers (2023-06-29T06:31:27Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Federated Latent Class Regression for Hierarchical Data [5.110894308882439]
Federated Learning (FL) allows a number of agents to participate in training a global machine learning model without disclosing locally stored data.
We propose a novel probabilistic model, Hierarchical Latent Class Regression (HLCR), and its extension to Federated Learning, FEDHLCR.
Our inference algorithm, being derived from Bayesian theory, provides strong convergence guarantees and good robustness to overfitting. Experimental results show that FEDHLCR offers fast convergence even in non-IID datasets.
arXiv Detail & Related papers (2022-06-22T00:33:04Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.