LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale
Instructions
- URL: http://arxiv.org/abs/2304.14402v3
- Date: Mon, 29 Jan 2024 02:58:23 GMT
- Title: LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale
Instructions
- Authors: Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham
Fikri Aji
- Abstract summary: Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities.
We develop a large set of 2.58M instructions based on both existing and newly-generated instructions.
We fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families.
- Score: 28.937552799649808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) with instruction fine-tuning demonstrate
superior generative capabilities. However, these models are resource-intensive.
To alleviate this issue, we explore distilling knowledge from instruction-tuned
LLMs into much smaller ones. To this end, we carefully develop a large set of
2.58M instructions based on both existing and newly-generated instructions. In
addition to being sizable, we design our instructions to cover a broad set of
topics to ensure diversity. Extensive analysis of our instruction dataset
confirms its diversity, and we generate responses for these instructions using
gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of
models, collectively referred to as LaMini-LM, which includes models from both
the encoder-decoder and decoder-only families, with varying sizes. We evaluate
the performance of our models using automatic metrics on 15 different natural
language processing (NLP) benchmarks, as well as through human assessment. The
results demonstrate that our proposed LaMini-LM models are comparable to
competitive baselines, while being much smaller in size.
Related papers
- Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation [56.75665429851673]
This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment.
Experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%.
arXiv Detail & Related papers (2024-09-27T08:20:59Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity [80.02202386597138]
We construct a high-quality, diverse visual instruction tuning dataset MMInstruct, which consists of 973K instructions from 24 domains.
Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at the cost of manual construction.
arXiv Detail & Related papers (2024-07-22T17:55:22Z) - MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs [47.94710556156627]
MIA-Bench is a benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions.
Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions.
arXiv Detail & Related papers (2024-07-01T17:53:35Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z) - Small Language Models Improve Giants by Rewriting Their Outputs [18.025736098795296]
We tackle the problem of leveraging training data to improve the performance of large language models (LLMs) without fine-tuning.
We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output.
Experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning.
arXiv Detail & Related papers (2023-05-22T22:07:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.