Rethinking Soft Label in Label Distribution Learning Perspective
- URL: http://arxiv.org/abs/2301.13444v1
- Date: Tue, 31 Jan 2023 06:47:19 GMT
- Title: Rethinking Soft Label in Label Distribution Learning Perspective
- Authors: Seungbum Hong, Jihun Yoon, Bogyu Park, Min-Kook Choi
- Abstract summary: The primary goal of training in early convolutional neural networks (CNN) is the higher generalization performance of the model.
We investigated that performing label distribution learning (LDL) would enhance the model calibration in CNN training.
We performed several visualizations and analyses and witnessed several interesting behaviors in CNN training with the LDL.
- Score: 0.27719338074999533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The primary goal of training in early convolutional neural networks (CNN) is
the higher generalization performance of the model. However, as the expected
calibration error (ECE), which quantifies the explanatory power of model
inference, was recently introduced, research on training models that can be
explained is in progress. We hypothesized that a gap in supervision criteria
during training and inference leads to overconfidence, and investigated that
performing label distribution learning (LDL) would enhance the model
calibration in CNN training. To verify this assumption, we used a simple LDL
setting with recent data augmentation techniques. Based on a series of
experiments, the following results are obtained: 1) State-of-the-art KD methods
significantly impede model calibration. 2) Training using LDL with recent data
augmentation can have excellent effects on model calibration and even in
generalization performance. 3) Online LDL brings additional improvements in
model calibration and accuracy with long training, especially in large-size
models. Using the proposed approach, we simultaneously achieved a lower ECE and
higher generalization performance for the image classification datasets
CIFAR10, 100, STL10, and ImageNet. We performed several visualizations and
analyses and witnessed several interesting behaviors in CNN training with the
LDL.
Related papers
- Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks [3.5284544394841117]
We show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures significantly improves model calibration.
We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
arXiv Detail & Related papers (2024-05-02T11:36:17Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain
Performance and Calibration [59.48235003469116]
We show that data augmentation consistently enhances OOD performance.
We also show that CF augmented models which are easier to calibrate also exhibit much lower entropy when assigning importance.
arXiv Detail & Related papers (2023-09-14T16:16:40Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Towards Foundation Models for Scientific Machine Learning:
Characterizing Scaling and Transfer Behavior [32.74388989649232]
We study how pre-training could be used for scientific machine learning (SciML) applications.
We find that fine-tuning these models yields more performance gains as model size increases.
arXiv Detail & Related papers (2023-06-01T00:32:59Z) - On the Importance of Calibration in Semi-supervised Learning [13.859032326378188]
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data.
We introduce a family of new SSL models that optimize for calibration and demonstrate their effectiveness across standard vision benchmarks.
arXiv Detail & Related papers (2022-10-10T15:41:44Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - Scaling Laws for the Few-Shot Adaptation of Pre-trained Image
Classifiers [11.408339220607251]
Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning.
Our main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers.
arXiv Detail & Related papers (2021-10-13T19:07:01Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.