Towards Understanding the Effect of Pretraining Label Granularity
- URL: http://arxiv.org/abs/2303.16887v2
- Date: Thu, 5 Oct 2023 17:32:26 GMT
- Title: Towards Understanding the Effect of Pretraining Label Granularity
- Authors: Guan Zhe Hong, Yin Cui, Ariel Fuxman, Stanley H. Chan, Enming Luo
- Abstract summary: We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem.
We show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels.
- Score: 23.61736162174686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study how the granularity of pretraining labels affects the
generalization of deep neural networks in image classification tasks. We focus
on the "fine-to-coarse" transfer learning setting, where the pretraining label
space is more fine-grained than that of the target problem. Empirically, we
show that pretraining on the leaf labels of ImageNet21k produces better
transfer results on ImageNet1k than pretraining on other coarser granularity
levels, which supports the common practice used in the community.
Theoretically, we explain the benefit of fine-grained pretraining by proving
that, for a data distribution satisfying certain hierarchy conditions, 1)
coarse-grained pretraining only allows a neural network to learn the "common"
or "easy-to-learn" features well, while 2) fine-grained pretraining helps the
network learn the "rarer" or "fine-grained" features in addition to the common
ones, thus improving its accuracy on hard downstream test samples in which
common features are missing or weak in strength. Furthermore, we perform
comprehensive experiments using the label hierarchies of iNaturalist 2021 and
observe that the following conditions, in addition to proper choice of label
granularity, enable the transfer to work well in practice: 1) the pretraining
dataset needs to have a meaningful label hierarchy, and 2) the pretraining and
target label functions need to align well.
Related papers
- Why Fine-grained Labels in Pretraining Benefit Generalization? [12.171634061370616]
Recent studies show that pretraining a deep neural network with fine-grained labeled data, followed by fine-tuning on coarse-labeled data, often yields better generalization than pretraining with coarse-labeled data.
This paper addresses this gap by introducing a "hierarchical multi-view" structure to confine the input data distribution.
Under this framework, we prove that: 1) coarse-grained pretraining only allows a neural network to learn the common features well, while 2) fine-grained pretraining helps the network learn the rare features in addition to the common ones, leading to improved accuracy on hard downstream test samples.
arXiv Detail & Related papers (2024-10-30T15:41:30Z) - Task Specific Pretraining with Noisy Labels for Remote Sensing Image Segmentation [18.598405597933752]
Self-supervision provides remote sensing a tool to reduce the amount of exact, human-crafted geospatial annotations.
In this work, we propose to exploit noisy semantic segmentation maps for model pretraining.
The results from two datasets indicate the effectiveness of task-specific supervised pretraining with noisy labels.
arXiv Detail & Related papers (2024-02-25T18:01:42Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Semi-weakly Supervised Contrastive Representation Learning for Retinal
Fundus Images [0.2538209532048867]
We propose a semi-weakly supervised contrastive learning framework for representation learning using semi-weakly annotated images.
We empirically validate the transfer learning performance of SWCL on seven public retinal fundus datasets.
arXiv Detail & Related papers (2021-08-04T15:50:09Z) - Self-Supervised Learning from Semantically Imprecise Data [7.24935792316121]
Learning from imprecise labels such as "animal" or "bird" is an important capability when expertly labeled training data is scarce.
CHILLAX is a recently proposed method to tackle this task.
We extend CHILLAX with a self-supervised scheme using constrained extrapolation to generate pseudo-labels.
arXiv Detail & Related papers (2021-04-22T07:26:14Z) - A Theoretical Analysis of Learning with Noisily Labeled Data [62.946840431501855]
We first show that in the first epoch training, the examples with clean labels will be learned first.
We then show that after the learning from clean data stage, continuously training model can achieve further improvement in testing error.
arXiv Detail & Related papers (2021-04-08T23:40:02Z) - Big Self-Supervised Models are Strong Semi-Supervised Learners [116.00752519907725]
We show that it is surprisingly effective for semi-supervised learning on ImageNet.
A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning.
We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network.
arXiv Detail & Related papers (2020-06-17T17:48:22Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.