Imbalance in Regression Datasets
- URL: http://arxiv.org/abs/2402.11963v1
- Date: Mon, 19 Feb 2024 09:06:26 GMT
- Title: Imbalance in Regression Datasets
- Authors: Daniel Kowatsch, Nicolas M. M\"uller, Kilian Tscharke, Philip Sperl,
Konstantin B\"otinger
- Abstract summary: We argue that imbalance in regression is an equally important problem which has so far been overlooked.
Due to under- and over-representations in a data set's target distribution, regressors are prone to degenerate to naive models.
- Score: 0.9374652839580183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For classification, the problem of class imbalance is well known and has been
extensively studied. In this paper, we argue that imbalance in regression is an
equally important problem which has so far been overlooked: Due to under- and
over-representations in a data set's target distribution, regressors are prone
to degenerate to naive models, systematically neglecting uncommon training data
and over-representing targets seen often during training. We analyse this
problem theoretically and use resulting insights to develop a first definition
of imbalance in regression, which we show to be a generalisation of the
commonly employed imbalance measure in classification. With this, we hope to
turn the spotlight on the overlooked problem of imbalance in regression and to
provide common ground for future research.
Related papers
- IM-Context: In-Context Learning for Imbalanced Regression Tasks [9.318067144029403]
This paper proposes a paradigm shift towards in-context learning as an effective alternative to conventional in-weight learning methods.
In-context learning refers to the ability of a model to condition itself, given a prompt sequence composed of in-context samples.
We study the impact of the prompt sequence on the model performance from both theoretical and empirical perspectives.
arXiv Detail & Related papers (2024-05-28T14:10:51Z) - Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes.
The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range.
We propose to construct hierarchical classifiers for solving imbalanced regression tasks.
Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z) - A step towards understanding why classification helps regression [16.741816961905947]
We show that the effect of adding a classification loss is the most pronounced for regression with imbalanced data.
For a regression task, if the data sampling is imbalanced, then add a classification loss.
arXiv Detail & Related papers (2023-08-21T10:00:46Z) - Uncertainty Voting Ensemble for Imbalanced Deep Regression [20.176217123752465]
In this paper, we introduce UVOTE, a method for learning from imbalanced data.
We replace traditional regression losses with negative log-likelihood, which also predicts sample-wise aleatoric uncertainty.
We show that UVOTE consistently outperforms the prior art, while at the same time producing better-calibrated uncertainty estimates.
arXiv Detail & Related papers (2023-05-24T14:12:21Z) - Balanced MSE for Imbalanced Visual Regression [36.97675494319161]
Data imbalance exists ubiquitously in real-world visual regressions.
imbalanced regression focuses on continuous labels, which can be boundless and high-dimensional.
We propose a novel loss function, Balanced MSE, to accommodate the imbalanced training label distribution.
arXiv Detail & Related papers (2022-03-30T16:21:42Z) - Variation-Incentive Loss Re-weighting for Regression Analysis on Biased
Data [8.115323786541078]
We aim to improve the accuracy of the regression analysis by addressing the data skewness/bias during model training.
We propose a Variation-Incentive Loss re-weighting method (VILoss) to optimize the gradient descent-based model training for regression analysis.
arXiv Detail & Related papers (2021-09-14T10:22:21Z) - Self-balanced Learning For Domain Generalization [64.99791119112503]
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.
Most existing approaches have been developed under the assumption that the source data is well-balanced in terms of both domain and class.
We propose a self-balanced domain generalization framework that adaptively learns the weights of losses to alleviate the bias caused by different distributions of the multi-domain source data.
arXiv Detail & Related papers (2021-08-31T03:17:54Z) - Don't Just Blame Over-parametrization for Over-confidence: Theoretical
Analysis of Calibration in Binary Classification [58.03725169462616]
We show theoretically that over-parametrization is not the only reason for over-confidence.
We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting.
Perhaps surprisingly, we also show that over-confidence is not always the case.
arXiv Detail & Related papers (2021-02-15T21:38:09Z) - Supercharging Imbalanced Data Learning With Energy-based Contrastive
Representation Transfer [72.5190560787569]
In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets.
Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions.
This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes.
arXiv Detail & Related papers (2020-11-25T00:13:11Z) - Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.