Improving the Generalization of Supervised Models
- URL: http://arxiv.org/abs/2206.15369v1
- Date: Thu, 30 Jun 2022 15:43:51 GMT
- Title: Improving the Generalization of Supervised Models
- Authors: Mert Bulent Sariyildiz, Yannis Kalantidis, Karteek Alahari, Diane
Larlus
- Abstract summary: In this paper, we propose a supervised learning setup that leverages the best of both worlds.
We show that these three improvements lead to a more favorable trade-off between the IN1K training task and 13 transfer tasks.
- Score: 30.264601433216246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of training a deep neural network on a given
classification task, e.g., ImageNet-1K (IN1K), so that it excels at that task
as well as at other (future) transfer tasks. These two seemingly contradictory
properties impose a trade-off between improving the model's generalization
while maintaining its performance on the original task. Models trained with
self-supervised learning (SSL) tend to generalize better than their supervised
counterparts for transfer learning; yet, they still lag behind supervised
models on IN1K. In this paper, we propose a supervised learning setup that
leverages the best of both worlds. We enrich the common supervised training
framework using two key components of recent SSL models: multi-scale crops for
data augmentation and the use of an expendable projector head. We replace the
last layer of class weights with class prototypes computed on the fly using a
memory bank. We show that these three improvements lead to a more favorable
trade-off between the IN1K training task and 13 transfer tasks. Over all the
explored configurations, we single out two models: t-ReX that achieves a new
state of the art for transfer learning and outperforms top methods such as DINO
and PAWS on IN1K, and t-ReX* that matches the highly optimized RSB-A1 model on
IN1K while performing better on transfer tasks. Project page and pretrained
models: https://europe.naverlabs.com/t-rex
Related papers
- Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention [2.66269503676104]
We introduce a novel fine-tuning method, called cross-attention (StochCA), specific to Transformer architectures.
This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning.
Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas.
arXiv Detail & Related papers (2024-02-25T13:53:49Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Can Wikipedia Help Offline Reinforcement Learning? [12.12541097531412]
Fine-tuning reinforcement learning models has been challenging because of a lack of large scale off-the-shelf datasets.
Recent work has looked at tackling offline RL with improved results as result of the introduction of the Transformer architecture.
We investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks.
arXiv Detail & Related papers (2022-01-28T13:55:35Z) - Ensembling Off-the-shelf Models for GAN Training [55.34705213104182]
We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators.
We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings.
Our method can improve GAN training in both limited data and large-scale settings.
arXiv Detail & Related papers (2021-12-16T18:59:50Z) - Adversarial Training of Variational Auto-encoders for Continual
Zero-shot Learning [1.90365714903665]
We present a hybrid network that consists of a shared VAE module to hold information of all tasks and task-specific private VAE modules for each task.
The model's size grows with each task to prevent catastrophic forgetting of task-specific skills.
We show our method is superior on class sequentially learning with ZSL(Zero-Shot Learning) and GZSL(Generalized Zero-Shot Learning)
arXiv Detail & Related papers (2021-02-07T11:21:24Z) - Do Adversarially Robust ImageNet Models Transfer Better? [102.09335596483695]
adversarially robust models often perform better than their standard-trained counterparts when used for transfer learning.
Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations.
arXiv Detail & Related papers (2020-07-16T17:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.