Related papers: Deep Ensembles for Low-Data Transfer Learning

Deep Ensembles for Low-Data Transfer Learning

URL: http://arxiv.org/abs/2010.06866v2
Date: Mon, 19 Oct 2020 10:59:20 GMT
Title: Deep Ensembles for Low-Data Transfer Learning
Authors: Basil Mustafa and Carlos Riquelme and Joan Puigcerver and Andr\'e Susano Pinto and Daniel Keysers and Neil Houlsby
Abstract summary: We study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity. We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
Score: 21.578470914935938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the low-data regime, it is difficult to train good supervised models from scratch. Instead practitioners turn to pre-trained models, leveraging transfer learning. Ensembling is an empirically and theoretically appealing way to construct powerful predictive models, but the predominant approach of training multiple deep networks with different random initialisations collides with the need for transfer via pre-trained weights. In this work, we study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity, and propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset. The approach is simple: Use nearest-neighbour accuracy to rank pre-trained models, fine-tune the best ones with a small hyperparameter sweep, and greedily construct an ensemble to minimise validation cross-entropy. When evaluated together with strong baselines on 19 different downstream tasks (the Visual Task Adaptation Benchmark), this achieves state-of-the-art performance at a much lower inference budget, even when selecting from over 2,000 pre-trained models. We also assess our ensembles on ImageNet variants and show improved robustness to distribution shift.

Related papers

Probabilistic Federated Prompt-Tuning with Non-IID and Imbalanced Data [35.47385526394076]
Fine-tuning pre-trained models is a popular approach in machine learning for solving complex tasks with moderate data. Fine-tuning the entire pre-trained model is ineffective in federated data scenarios where local data distributions are diversely skewed. Our approach transforms federated learning into a distributed set modeling task, aggregating diverse sets of prompts to globally fine-tune the pre-trained model.
arXiv Detail & Related papers (2025-02-27T04:31:34Z)
Test-Time Alignment via Hypothesis Reweighting [56.71167047381817]
Large pretrained models often struggle with underspecified tasks. We propose a novel framework to address the challenge of aligning models to test-time user intent.
arXiv Detail & Related papers (2024-12-11T23:02:26Z)
Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Pre-Trained Vision-Language Models as Partial Annotators [40.89255396643592]
Pre-trained vision-language models learn massive data to model unified representations of images and natural languages. In this paper, we investigate a novel "pre-trained annotating - weakly-supervised learning" paradigm for pre-trained model application and experiment on image classification tasks.
arXiv Detail & Related papers (2024-05-23T17:17:27Z)
Distributionally Robust Post-hoc Classifiers under Prior Shifts [31.237674771958165]
We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model.
arXiv Detail & Related papers (2023-09-16T00:54:57Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data [78.21197488065177]
Recent success in fine-tuning large models, that are pretrained on broad data at scale, on downstream tasks has led to a significant paradigm shift in deep learning. This paper proposes a new task-agnostic framework, textitSynBench, to measure the quality of pretrained representations using synthetic data.
arXiv Detail & Related papers (2022-10-06T15:25:00Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
Revisiting the Updates of a Pre-trained Model for Few-shot Learning [11.871523410051527]
We compare the two popular updating methods, fine-tuning and linear probing. We find that fine-tuning is better than linear probing as the number of samples increases.
arXiv Detail & Related papers (2022-05-13T08:47:06Z)
Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks. We exploit semantic-preserving transformation to enrich downstream data diversity. We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z)
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Recent studies suggest that pre-training benefits from gigantic model capacity. In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.