Scaling Laws for the Few-Shot Adaptation of Pre-trained Image
Classifiers
- URL: http://arxiv.org/abs/2110.06990v1
- Date: Wed, 13 Oct 2021 19:07:01 GMT
- Title: Scaling Laws for the Few-Shot Adaptation of Pre-trained Image
Classifiers
- Authors: Gabriele Prato, Simon Guiroy, Ethan Caballero, Irina Rish, Sarath
Chandar
- Abstract summary: Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning.
Our main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers.
- Score: 11.408339220607251
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Empirical science of neural scaling laws is a rapidly growing area of
significant importance to the future of machine learning, particularly in the
light of recent breakthroughs achieved by large-scale pre-trained models such
as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance
with increasing resources such as data, compute and model size provides a more
comprehensive evaluation of different approaches across multiple scales, as
opposed to traditional point-wise comparisons of fixed-size models on
fixed-size benchmarks, and, most importantly, allows for focus on the
best-scaling, and thus most promising in the future, approaches. In this work,
we consider a challenging problem of few-shot learning in image classification,
especially when the target data distribution in the few-shot phase is different
from the source, training, data distribution, in a sense that it includes new
image classes not encountered during training. Our current main goal is to
investigate how the amount of pre-training data affects the few-shot
generalization performance of standard image classifiers. Our key observations
are that (1) such performance improvements are well-approximated by power laws
(linear log-log plots) as the training set size increases, (2) this applies to
both cases of target data coming from either the same or from a different
domain (i.e., new classes) as the training data, and (3) few-shot performance
on new classes converges at a faster rate than the standard classification
performance on previously seen classes. Our findings shed new light on the
relationship between scale and generalization.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Unified Neural Network Scaling Laws and Scale-time Equivalence [10.918504301310753]
We present a novel theoretical characterization of how three factors -- model size, training time, and data volume -- interact to determine the performance of deep neural networks.
We first establish a theoretical and empirical equivalence between scaling the size of a neural network and increasing its training time proportionally.
We then combine scale-time equivalence with a linear model analysis of double descent to obtain a unified theoretical scaling law.
arXiv Detail & Related papers (2024-09-09T16:45:26Z) - Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision Transformers [12.590571371294729]
Few-shot class-incremental learning (FSCIL) aims to adapt the model to new classes from very few data (5 samples) without forgetting the previously learned classes.
Recent works in many-shot CIL (MSCIL) exploited pre-trained models to reduce forgetting and achieve better plasticity.
We use ViT models pre-trained on large-scale datasets for few-shot settings, which face the critical issue of low plasticity.
arXiv Detail & Related papers (2024-04-09T21:12:31Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs [9.010643838773477]
We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
arXiv Detail & Related papers (2023-10-17T09:22:22Z) - RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph
Classification [10.806893809269074]
We propose a novel framework called Retrieval Augmented Hybrid Network (RAHNet) to jointly learn a robust feature extractor and an unbiased classifier.
In the feature extractor training stage, we develop a graph retrieval module to search for relevant graphs that directly enrich the intra-class diversity for the tail classes.
We also innovatively optimize a category-centered supervised contrastive loss to obtain discriminative representations.
arXiv Detail & Related papers (2023-08-04T14:06:44Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Leveraging Angular Information Between Feature and Classifier for
Long-tailed Learning: A Prediction Reformulation Approach [90.77858044524544]
We reformulate the recognition probabilities through included angles without re-balancing the classifier weights.
Inspired by the performance improvement of the predictive form reformulation, we explore the different properties of this angular prediction.
Our method is able to obtain the best performance among peer methods without pretraining on CIFAR10/100-LT and ImageNet-LT.
arXiv Detail & Related papers (2022-12-03T07:52:48Z) - Revisiting the Updates of a Pre-trained Model for Few-shot Learning [11.871523410051527]
We compare the two popular updating methods, fine-tuning and linear probing.
We find that fine-tuning is better than linear probing as the number of samples increases.
arXiv Detail & Related papers (2022-05-13T08:47:06Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - Closing the Generalization Gap in One-Shot Object Detection [92.82028853413516]
We show that the key to strong few-shot detection models may not lie in sophisticated metric learning approaches, but instead in scaling the number of categories.
Future data annotation efforts should therefore focus on wider datasets and annotate a larger number of categories.
arXiv Detail & Related papers (2020-11-09T09:31:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.