Few-shot Multimodal Multitask Multilingual Learning
- URL: http://arxiv.org/abs/2303.12489v1
- Date: Sun, 19 Feb 2023 03:48:46 GMT
- Title: Few-shot Multimodal Multitask Multilingual Learning
- Authors: Aman Chadha, Vinija Jain
- Abstract summary: We propose few-shot learning for a multimodal multitask multilingual (FM3) setting by adapting pre-trained vision and language models.
FM3 learns the most prominent tasks in the vision and language domains along with their intersections.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While few-shot learning as a transfer learning paradigm has gained
significant traction for scenarios with limited data, it has primarily been
explored in the context of building unimodal and unilingual models.
Furthermore, a significant part of the existing literature in the domain of
few-shot multitask learning perform in-context learning which requires manually
generated prompts as the input, yielding varying outcomes depending on the
level of manual prompt-engineering. In addition, in-context learning suffers
from substantial computational, memory, and storage costs which eventually
leads to high inference latency because it involves running all of the prompt's
examples through the model every time a prediction is made. In contrast,
methods based on the transfer learning via the fine-tuning paradigm avoid the
aforementioned issues at a one-time cost of fine-tuning weights on a per-task
basis. However, such methods lack exposure to few-shot multimodal multitask
learning. In this paper, we propose few-shot learning for a multimodal
multitask multilingual (FM3) setting by adapting pre-trained vision and
language models using task-specific hypernetworks and contrastively fine-tuning
them to enable few-shot learning. FM3's architecture combines the best of both
worlds of in-context and fine-tuning based learning and consists of three major
components: (i) multimodal contrastive fine-tuning to enable few-shot learning,
(ii) hypernetwork task adaptation to perform multitask learning, and (iii)
task-specific output heads to cater to a plethora of diverse tasks. FM3 learns
the most prominent tasks in the vision and language domains along with their
intersections, namely visual entailment (VE), visual question answering (VQA),
and natural language understanding (NLU) tasks such as neural entity
recognition (NER) and the GLUE benchmark including QNLI, MNLI, QQP, and SST-2.
Related papers
- Exploring the Transferability of Visual Prompting for Multimodal Large Language Models [47.162575147632396]
Transferable Visual Prompting (TVP) is a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model.
We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts.
arXiv Detail & Related papers (2024-04-17T09:39:07Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - Task-Attentive Transformer Architecture for Continual Learning of
Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks.
We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks.
Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z) - In-context Learning Distillation: Transferring Few-shot Learning Ability
of Pre-trained Language Models [55.78264509270503]
We introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models.
We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT)
Our experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm.
arXiv Detail & Related papers (2022-12-20T22:11:35Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Multi-Task Learning for Visual Scene Understanding [7.191593674138455]
This thesis is concerned with multi-task learning in the context of computer vision.
We propose several methods that tackle important aspects of multi-task learning.
The results show several advances in the state-of-the-art of multi-task learning.
arXiv Detail & Related papers (2022-03-28T16:57:58Z) - Multi-Task Learning with Deep Neural Networks: A Survey [0.0]
Multi-task learning (MTL) is a subfield of machine learning in which multiple tasks are simultaneously learned by a shared model.
We give an overview of multi-task learning methods for deep neural networks, with the aim of summarizing both the well-established and most recent directions within the field.
arXiv Detail & Related papers (2020-09-10T19:31:04Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.