Delving into Deep Imbalanced Regression
- URL: http://arxiv.org/abs/2102.09554v1
- Date: Thu, 18 Feb 2021 18:56:03 GMT
- Title: Delving into Deep Imbalanced Regression
- Authors: Yuzhe Yang, Kaiwen Zha, Ying-Cong Chen, Hao Wang, Dina Katabi
- Abstract summary: We define Deep Imbalanced Regression (DIR) as learning from imbalanced data with continuous targets.
Motivated by the intrinsic difference between categorical and continuous label space, we propose distribution smoothing for both labels and features.
Our work fills the gap in benchmarks and techniques for practical imbalanced regression problems.
- Score: 41.90687097747504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world data often exhibit imbalanced distributions, where certain target
values have significantly fewer observations. Existing techniques for dealing
with imbalanced data focus on targets with categorical indices, i.e., different
classes. However, many tasks involve continuous targets, where hard boundaries
between classes do not exist. We define Deep Imbalanced Regression (DIR) as
learning from such imbalanced data with continuous targets, dealing with
potential missing data for certain target values, and generalizing to the
entire target range. Motivated by the intrinsic difference between categorical
and continuous label space, we propose distribution smoothing for both labels
and features, which explicitly acknowledges the effects of nearby targets, and
calibrates both label and learned feature distributions. We curate and
benchmark large-scale DIR datasets from common real-world tasks in computer
vision, natural language processing, and healthcare domains. Extensive
experiments verify the superior performance of our strategies. Our work fills
the gap in benchmarks and techniques for practical imbalanced regression
problems. Code and data are available at
https://github.com/YyzHarry/imbalanced-regression.
Related papers
- Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - Uncertainty Voting Ensemble for Imbalanced Deep Regression [20.176217123752465]
In this paper, we introduce UVOTE, a method for learning from imbalanced data.
We replace traditional regression losses with negative log-likelihood, which also predicts sample-wise aleatoric uncertainty.
We show that UVOTE consistently outperforms the prior art, while at the same time producing better-calibrated uncertainty estimates.
arXiv Detail & Related papers (2023-05-24T14:12:21Z) - Model Optimization in Imbalanced Regression [2.580765958706854]
Imbalanced domain learning aims to produce accurate models in predicting instances that, though underrepresented, are of utmost importance for the domain.
One of the main reasons for this is the lack of loss functions capable of focusing on minimizing the errors of extreme (rare) values.
Recently, an evaluation metric was introduced: Squared Error Relevance Area (SERA)
This metric posits a bigger emphasis on the errors committed at extreme values while also accounting for the performance in the overall target variable domain.
arXiv Detail & Related papers (2022-06-20T20:23:56Z) - Extending the WILDS Benchmark for Unsupervised Adaptation [186.90399201508953]
We present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data.
These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities.
We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods.
arXiv Detail & Related papers (2021-12-09T18:32:38Z) - Improving Contrastive Learning on Imbalanced Seed Data via Open-World
Sampling [96.8742582581744]
We present an open-world unlabeled data sampling framework called Model-Aware K-center (MAK)
MAK follows three simple principles: tailness, proximity, and diversity.
We demonstrate that MAK can consistently improve both the overall representation quality and the class balancedness of the learned features.
arXiv Detail & Related papers (2021-11-01T15:09:41Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Variation-Incentive Loss Re-weighting for Regression Analysis on Biased
Data [8.115323786541078]
We aim to improve the accuracy of the regression analysis by addressing the data skewness/bias during model training.
We propose a Variation-Incentive Loss re-weighting method (VILoss) to optimize the gradient descent-based model training for regression analysis.
arXiv Detail & Related papers (2021-09-14T10:22:21Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - A First Step Towards Distribution Invariant Regression Metrics [1.370633147306388]
In classification, it has been stated repeatedly that performance metrics like the F-Measure and Accuracy are highly dependent on the class distribution.
We show that the same problem exists in regression. The distribution of odometry parameters in robotic applications can for example largely vary between different recording sessions.
Here, we need regression algorithms that either perform equally well for all function values, or that focus on certain boundary regions like high speed.
arXiv Detail & Related papers (2020-09-10T23:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.