APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal
Representations
- URL: http://arxiv.org/abs/2210.03927v1
- Date: Sat, 8 Oct 2022 05:38:46 GMT
- Title: APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal
Representations
- Authors: Elan Rosenfeld, Preetum Nakkiran, Hadi Pouransari, Oncel Tuzel,
Fartash Faghri
- Abstract summary: We take advantage of existing unimodal encoders and careful curation of alignment data relevant to the downstream task of interest.
We study a natural approach to aligning existing encoders via small auxiliary functions, and we find that this method is competitive with (or outperforms) state of the art in many settings.
- Score: 21.651198216809608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in learning aligned multimodal representations have been
primarily driven by training large neural networks on massive, noisy
paired-modality datasets. In this work, we ask whether it is possible to
achieve similar results with substantially less training time and data. We
achieve this by taking advantage of existing pretrained unimodal encoders and
careful curation of alignment data relevant to the downstream task of interest.
We study a natural approach to aligning existing encoders via small auxiliary
functions, and we find that this method is competitive with (or outperforms)
state of the art in many settings while being less prone to overfitting, less
costly to train, and more robust to distribution shift. With a properly chosen
alignment distribution, our method surpasses prior state of the art for
ImageNet zero-shot classification on public data while using two orders of
magnitude less time and data and training 77% fewer parameters.
Related papers
- DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning [6.635683993472882]
We propose a novel fine-tuning method to achieve multi-operator learning.
Our approach combines distributed learning to integrate data from various operators in pre-training, while physics-informed methods enable zero-shot fine-tuning.
arXiv Detail & Related papers (2024-11-11T18:58:46Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Top-Tuning: a study on transfer learning for an efficient alternative to
fine tuning for image classification with fast kernel methods [12.325059377851485]
In this paper, we consider a simple transfer learning approach exploiting pre-trained convolutional features as input for a fast-to-train kernel method.
We show that the top-tuning approach provides comparable accuracy with respect to fine-tuning, with a training time between one and two orders of magnitude smaller.
arXiv Detail & Related papers (2022-09-16T13:46:59Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Multimodal Prototypical Networks for Few-shot Learning [20.100480009813953]
Cross-modal feature generation framework is used to enrich the low populated embedding space in few-shot scenarios.
We show that in such cases nearest neighbor classification is a viable approach and outperform state-of-the-art single-modal and multimodal few-shot learning methods.
arXiv Detail & Related papers (2020-11-17T19:32:59Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.