Rethinking Soft Label in Label Distribution Learning Perspective
- URL: http://arxiv.org/abs/2301.13444v1
- Date: Tue, 31 Jan 2023 06:47:19 GMT
- Title: Rethinking Soft Label in Label Distribution Learning Perspective
- Authors: Seungbum Hong, Jihun Yoon, Bogyu Park, Min-Kook Choi
- Abstract summary: The primary goal of training in early convolutional neural networks (CNN) is the higher generalization performance of the model.
We investigated that performing label distribution learning (LDL) would enhance the model calibration in CNN training.
We performed several visualizations and analyses and witnessed several interesting behaviors in CNN training with the LDL.
- Score: 0.27719338074999533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The primary goal of training in early convolutional neural networks (CNN) is
the higher generalization performance of the model. However, as the expected
calibration error (ECE), which quantifies the explanatory power of model
inference, was recently introduced, research on training models that can be
explained is in progress. We hypothesized that a gap in supervision criteria
during training and inference leads to overconfidence, and investigated that
performing label distribution learning (LDL) would enhance the model
calibration in CNN training. To verify this assumption, we used a simple LDL
setting with recent data augmentation techniques. Based on a series of
experiments, the following results are obtained: 1) State-of-the-art KD methods
significantly impede model calibration. 2) Training using LDL with recent data
augmentation can have excellent effects on model calibration and even in
generalization performance. 3) Online LDL brings additional improvements in
model calibration and accuracy with long training, especially in large-size
models. Using the proposed approach, we simultaneously achieved a lower ECE and
higher generalization performance for the image classification datasets
CIFAR10, 100, STL10, and ImageNet. We performed several visualizations and
analyses and witnessed several interesting behaviors in CNN training with the
LDL.
Related papers
- Model Balancing Helps Low-data Training and Fine-tuning [19.63134953504884]
Recent advances in foundation models have emphasized the need to align pre-trained models with specialized domains.
These topics have also gained increasing attention in the emerging field of scientific machine learning (SciML)
To address the limitations of low-data training and fine-tuning, we draw inspiration from Heavy-Tailed Self-Regularization (HT-SR) theory.
We adapt a recently proposed layer-wise learning rate scheduler, TempBalance, which effectively balances training quality across layers.
arXiv Detail & Related papers (2024-10-16T02:48:39Z) - Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks [3.5284544394841117]
We show that decoupling the training of feature extraction layers and classification layers in over-parametrized DNN architectures significantly improves model calibration.
We illustrate these methods improve calibration across ViT and WRN architectures for several image classification benchmark datasets.
arXiv Detail & Related papers (2024-05-02T11:36:17Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - On the Importance of Calibration in Semi-supervised Learning [13.859032326378188]
State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data.
We introduce a family of new SSL models that optimize for calibration and demonstrate their effectiveness across standard vision benchmarks.
arXiv Detail & Related papers (2022-10-10T15:41:44Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - Scaling Laws for the Few-Shot Adaptation of Pre-trained Image
Classifiers [11.408339220607251]
Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning.
Our main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers.
arXiv Detail & Related papers (2021-10-13T19:07:01Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.