MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning
- URL: http://arxiv.org/abs/2405.11446v1
- Date: Sun, 19 May 2024 04:49:42 GMT
- Title: MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning
- Authors: Sanchit Sinha, Yuguang Yue, Victor Soto, Mayank Kulkarni, Jianhua Lu, Aidong Zhang,
- Abstract summary: We propose MAML-en-LLM, a novel method for meta-training large language models (LLMs)
MAML-en-LLM can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks.
We demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains.
- Score: 43.512739869120125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially perform in-context multi-task fine-tuning and evaluate on a disjointed test set of tasks. Even though they achieve impressive performance, their goal is never to compute a truly general set of parameters. In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks. We see an average increase of 2% on unseen domains in the performance while a massive 4% improvement on adaptation performance. Furthermore, we demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains by an average of 2%. Finally, we discuss the effects of type of tasks, optimizers and task complexity, an avenue barely explored in meta-training literature. Exhaustive experiments across 7 task settings along with two data settings demonstrate that models trained with MAML-en-LLM outperform SOTA meta-training approaches.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic [6.46176287368784]
We propose textbfModel textbfExclusive textbfTask textbfArithmetic for merging textbfGPT-scale models.
Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.
arXiv Detail & Related papers (2024-06-17T10:12:45Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - MetaICL: Learning to Learn In Context [87.23056864536613]
We introduce MetaICL, a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learn-ing on a large set of training tasks.
We show that MetaICL approaches (and sometimes beats) the performance of models fully finetuned on the target task training data, and outperforms much bigger models with nearly 8x parameters.
arXiv Detail & Related papers (2021-10-29T17:42:08Z) - Meta-Learning with Fewer Tasks through Task Interpolation [67.03769747726666]
Current meta-learning algorithms require a large number of meta-training tasks, which may not be accessible in real-world scenarios.
By meta-learning with task gradient (MLTI), our approach effectively generates additional tasks by randomly sampling a pair of tasks and interpolating the corresponding features and labels.
Empirically, in our experiments on eight datasets from diverse domains, we find that the proposed general MLTI framework is compatible with representative meta-learning algorithms and consistently outperforms other state-of-the-art strategies.
arXiv Detail & Related papers (2021-06-04T20:15:34Z) - A Nested Bi-level Optimization Framework for Robust Few Shot Learning [10.147225934340877]
NestedMAML learns to assign weights to training tasks or instances.
Experiments on synthetic and real-world datasets demonstrate that NestedMAML efficiently mitigates the effects of "unwanted" tasks or instances.
arXiv Detail & Related papers (2020-11-13T06:41:22Z) - BI-MAML: Balanced Incremental Approach for Meta Learning [9.245355087256314]
We present a novel Balanced Incremental Model Agnostic Meta Learning system (BI-MAML) for learning multiple tasks.
Our method implements a meta-update rule to incrementally adapt its model to new tasks without forgetting old tasks.
Our system performs the meta-updates with only a few-shots and can successfully accomplish them.
arXiv Detail & Related papers (2020-06-12T18:28:48Z) - Structured Prediction for Conditional Meta-Learning [44.30857707980074]
We propose a new perspective on conditional meta-learning via structured prediction.
We derive task-adaptive structured meta-learning (TASML), a principled framework that yields task-specific objective functions.
Empirically, we show that TASML improves the performance of existing meta-learning models, and outperforms the state-of-the-art on benchmark datasets.
arXiv Detail & Related papers (2020-02-20T15:24:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.