Meta Learning to Bridge Vision and Language Models for Multimodal
Few-Shot Learning
- URL: http://arxiv.org/abs/2302.14794v1
- Date: Tue, 28 Feb 2023 17:46:18 GMT
- Title: Meta Learning to Bridge Vision and Language Models for Multimodal
Few-Shot Learning
- Authors: Ivona Najdenkoska, Xiantong Zhen, Marcel Worring
- Abstract summary: We introduce a multimodal meta-learning approach to bridge the gap between vision and language models.
We define a meta-mapper network, acting as a meta-learner, to efficiently bridge frozen large-scale vision and language models.
We evaluate our approach on recently proposed multimodal few-shot benchmarks, measuring how rapidly the model can bind novel visual concepts to words.
- Score: 38.37682598345653
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multimodal few-shot learning is challenging due to the large domain gap
between vision and language modalities. Existing methods are trying to
communicate visual concepts as prompts to frozen language models, but rely on
hand-engineered task induction to reduce the hypothesis space. To make the
whole process learnable, we introduce a multimodal meta-learning approach.
Specifically, our approach decomposes the training of the model into a set of
related multimodal few-shot tasks. We define a meta-mapper network, acting as a
meta-learner, to efficiently bridge frozen large-scale vision and language
models and leverage their already learned capacity. By updating the learnable
parameters only of the meta-mapper, it learns to accrue shared meta-knowledge
among these tasks. Thus, it can rapidly adapt to newly presented samples with
only a few gradient updates. Importantly, it induces the task in a completely
data-driven manner, with no need for a hand-engineered task induction. We
evaluate our approach on recently proposed multimodal few-shot benchmarks,
measuring how rapidly the model can bind novel visual concepts to words and
answer visual questions by observing only a limited set of labeled examples.
The experimental results show that our meta-learning approach outperforms the
baseline across multiple datasets and various training settings while being
computationally more efficient.
Related papers
- Meta-training with Demonstration Retrieval for Efficient Few-shot
Learning [11.723856248352007]
Large language models show impressive results on few-shot NLP tasks.
These models are memory and computation-intensive.
We propose meta-training with demonstration retrieval.
arXiv Detail & Related papers (2023-06-30T20:16:22Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Meta-Learning via Classifier(-free) Guidance [5.812784742024491]
State-of-the-art meta-learning techniques do not optimize for zero-shot adaptation to unseen tasks.
We propose meta-learning techniques that use natural language guidance to achieve higher zero-shot performance.
arXiv Detail & Related papers (2022-10-17T11:09:35Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z) - Few-Shot Learning with a Strong Teacher [36.35502703114652]
Few-shot learning aims to train a strong classifier using limited labeled examples.
Many existing works take the meta-learning approach, sampling few-shot tasks in turn and optimizing the few-shot learner's performance on classifying the query examples.
We propose a novel objective to directly train the few-shot learner to perform like a strong classifier.
arXiv Detail & Related papers (2021-07-01T03:20:46Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning [79.25478727351604]
We explore a simple process: meta-learning over a whole-classification pre-trained model on its evaluation metric.
We observe this simple method achieves competitive performance to state-of-the-art methods on standard benchmarks.
arXiv Detail & Related papers (2020-03-09T20:06:36Z) - Revisiting Meta-Learning as Supervised Learning [69.2067288158133]
We aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning.
By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning.
This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning.
arXiv Detail & Related papers (2020-02-03T06:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.