LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale
  Instructions
        - URL: http://arxiv.org/abs/2304.14402v3
- Date: Mon, 29 Jan 2024 02:58:23 GMT
- Title: LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale
  Instructions
- Authors: Minghao Wu, Abdul Waheed, Chiyu Zhang, Muhammad Abdul-Mageed, Alham
  Fikri Aji
- Abstract summary: Large language models (LLMs) with instruction fine-tuning demonstrate superior generative capabilities.
We develop a large set of 2.58M instructions based on both existing and newly-generated instructions.
We fine-tune a diverse herd of models, collectively referred to as LaMini-LM, which includes models from both the encoder-decoder and decoder-only families.
- Score: 28.937552799649808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Large language models (LLMs) with instruction fine-tuning demonstrate
superior generative capabilities. However, these models are resource-intensive.
To alleviate this issue, we explore distilling knowledge from instruction-tuned
LLMs into much smaller ones. To this end, we carefully develop a large set of
2.58M instructions based on both existing and newly-generated instructions. In
addition to being sizable, we design our instructions to cover a broad set of
topics to ensure diversity. Extensive analysis of our instruction dataset
confirms its diversity, and we generate responses for these instructions using
gpt-3.5-turbo. Leveraging these instructions, we fine-tune a diverse herd of
models, collectively referred to as LaMini-LM, which includes models from both
the encoder-decoder and decoder-only families, with varying sizes. We evaluate
the performance of our models using automatic metrics on 15 different natural
language processing (NLP) benchmarks, as well as through human assessment. The
results demonstrate that our proposed LaMini-LM models are comparable to
competitive baselines, while being much smaller in size.
 
      
        Related papers
        - Capability Instruction Tuning: A New Paradigm for Dynamic LLM Routing [64.38277118982698]
 Large Language Models (LLMs) have demonstrated human-like instruction-following abilities.
In this work, we explore how to route the best-performing LLM for each instruction to achieve better overall performance.
We develop a new paradigm, constructing capability instructions with model capability representation, user instruction, and performance inquiry prompts to assess the performance.
 arXiv  Detail & Related papers  (2025-02-24T16:10:53Z)
- Smaller Language Models Are Better Instruction Evolvers [10.587052565101844]
 Small language models (SLMs) can synthesize more effective instructions than large language models (LLMs)
We propose Instruction Complex-Aware IFD (IC-IFD) to evaluate the effectiveness of instruction data more accurately.
 arXiv  Detail & Related papers  (2024-12-15T16:07:48Z)
- Align$^2$LLaVA: Cascaded Human and Large Language Model Preference   Alignment for Multi-modal Instruction Curation [56.75665429851673]
 This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment.
Experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%.
 arXiv  Detail & Related papers  (2024-09-27T08:20:59Z)
- LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
 In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
 arXiv  Detail & Related papers  (2024-07-28T06:10:47Z)
- MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with   Extensive Diversity [80.02202386597138]
 We construct a high-quality, diverse visual instruction tuning dataset MMInstruct, which consists of 973K instructions from 24 domains.
Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at the cost of manual construction.
 arXiv  Detail & Related papers  (2024-07-22T17:55:22Z)
- MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal   LLMs [47.94710556156627]
 MIA-Bench is a benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions.
Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models' compliance with layered instructions.
 arXiv  Detail & Related papers  (2024-07-01T17:53:35Z)
- LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
 CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
 arXiv  Detail & Related papers  (2024-01-04T18:53:01Z)
- Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models [125.91897197446379]
 We find that MoE models benefit more from instruction tuning than dense models.
Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
 arXiv  Detail & Related papers  (2023-05-24T04:22:26Z)
- Small Language Models Improve Giants by Rewriting Their Outputs [18.025736098795296]
 We tackle the problem of leveraging training data to improve the performance of large language models (LLMs) without fine-tuning.
We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output.
Experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning.
 arXiv  Detail & Related papers  (2023-05-22T22:07:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.