Delving into Semantic Scale Imbalance
- URL: http://arxiv.org/abs/2212.14613v8
- Date: Sat, 8 Apr 2023 11:35:37 GMT
- Title: Delving into Semantic Scale Imbalance
- Authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Yuxin Li, Shuyuan Yang, Xu Liu
- Abstract summary: We define and quantify the semantic scale of classes, which is used to measure the feature diversity of classes.
We propose semantic-scale-balanced learning, including a general loss improvement scheme and a dynamic re-weighting training framework.
Comprehensive experiments show that dynamic semantic-scale-balanced learning consistently enables the model to perform superiorly on large-scale long-tailed and non-long-tailed natural and medical datasets.
- Score: 45.30062061215943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model bias triggered by long-tailed data has been widely studied. However,
measure based on the number of samples cannot explicate three phenomena
simultaneously: (1) Given enough data, the classification performance gain is
marginal with additional samples. (2) Classification performance decays
precipitously as the number of training samples decreases when there is
insufficient data. (3) Model trained on sample-balanced datasets still has
different biases for different classes. In this work, we define and quantify
the semantic scale of classes, which is used to measure the feature diversity
of classes. It is exciting to find experimentally that there is a marginal
effect of semantic scale, which perfectly describes the first two phenomena.
Further, the quantitative measurement of semantic scale imbalance is proposed,
which can accurately reflect model bias on multiple datasets, even on
sample-balanced data, revealing a novel perspective for the study of class
imbalance. Due to the prevalence of semantic scale imbalance, we propose
semantic-scale-balanced learning, including a general loss improvement scheme
and a dynamic re-weighting training framework that overcomes the challenge of
calculating semantic scales in real-time during iterations. Comprehensive
experiments show that dynamic semantic-scale-balanced learning consistently
enables the model to perform superiorly on large-scale long-tailed and
non-long-tailed natural and medical datasets, which is a good starting point
for mitigating the prevalent but unnoticed model bias.
Related papers
- Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Predicting and Enhancing the Fairness of DNNs with the Curvature of Perceptual Manifolds [44.79535333220044]
Recent studies have shown that tail classes are not always hard to learn, and model bias has been observed on sample-balanced datasets.
In this work, we first establish a geometric perspective for analyzing model fairness and then systematically propose a series of geometric measurements.
arXiv Detail & Related papers (2023-03-22T04:49:23Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced
Data [11.66734752179563]
Classification on long-tailed distributed data is a challenging problem.
Learning on tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task.
We propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning.
arXiv Detail & Related papers (2022-07-22T03:39:51Z) - Bias-inducing geometries: an exactly solvable data model with fairness
implications [13.690313475721094]
We introduce an exactly solvable high-dimensional model of data imbalance.
We analytically unpack the typical properties of learning models trained in this synthetic framework.
We obtain exact predictions for the observables that are commonly employed for fairness assessment.
arXiv Detail & Related papers (2022-05-31T16:27:57Z) - Mitigating Dataset Bias by Using Per-sample Gradient [9.290757451344673]
We propose PGD (Per-sample Gradient-based Debiasing), that comprises three steps: training a model on uniform batch sampling, setting the importance of each sample in proportion to the norm of the sample gradient, and training the model using importance-batch sampling.
Compared with existing baselines for various synthetic and real-world datasets, the proposed method showed state-of-the-art accuracy for a the classification task.
arXiv Detail & Related papers (2022-05-31T11:41:02Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - An Empirical Study on the Joint Impact of Feature Selection and Data
Resampling on Imbalance Classification [4.506770920842088]
This study focuses on the synergy between feature selection and data resampling for imbalance classification.
We conduct a large amount of experiments on 52 publicly available datasets, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classification algorithms.
arXiv Detail & Related papers (2021-09-01T06:01:51Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.