Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning
- URL: http://arxiv.org/abs/2502.09854v1
- Date: Fri, 14 Feb 2025 01:39:45 GMT
- Title: Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning
- Authors: Yu-Chen Lin, Sanat Sharma, Hari Manikandan, Jayant Kumar, Tracy Holloway King, Jing Zheng,
- Abstract summary: Small language models (SLMs) can achieve competitive performance in multitask prompt generation tasks.
We train an SLM that achieves relevance scores within 5% of state-of-the-art models, including Llama-3, Qwen2, and Mistral, despite being up to 80 times smaller.
- Score: 8.995427413172148
- License:
- Abstract: In this work, we demonstrate that small language models (SLMs), specifically a 100M parameter GPT-2 model, can achieve competitive performance in multitask prompt generation tasks while requiring only a fraction of the computational resources needed by large language models (LLMs). Through a novel combination of upside-down reinforcement learning and synthetic data distillation from a powerful LLM, Llama-3, we train an SLM that achieves relevance scores within 5% of state-of-the-art models, including Llama-3, Qwen2, and Mistral, despite being up to 80 times smaller, making it highly suitable for resource-constrained and real-time applications. This study highlights the potential of SLMs as efficient multitask learners in multimodal settings, providing a promising alternative to LLMs for scalable, low-latency deployments.
Related papers
- Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters.
We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z) - LLaVA-KD: A Framework of Distilling Multimodal Large Language Models [70.19607283302712]
We propose a novel framework to transfer knowledge from l-MLLM to s-MLLM.
Specifically, we introduce Multimodal Distillation (MDist) to minimize the divergence between the visual-textual output distributions of l-MLLM and s-MLLM.
We also propose a three-stage training scheme to fully exploit the potential of s-MLLM.
arXiv Detail & Related papers (2024-10-21T17:41:28Z) - NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks.
We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities.
We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z) - LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation [41.05687297326706]
LLaVA-MoD is a framework designed to enable the efficient training of small-scale Multimodal Language Models.
We optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts architecture into the language model.
We also propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration.
arXiv Detail & Related papers (2024-08-28T15:52:23Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Mixed Distillation Helps Smaller Language Model Better Reasoning [27.934081882868902]
We introduce Mixed Distillation (MD) framework, which capitalizes on the strengths of Program of Thought (PoT) and Chain of Thought (CoT) capabilities within large language models (LLMs)
Our experimental results show that MD significantly enhances the single-path and multi-path reasoning ability of smaller models in various tasks.
arXiv Detail & Related papers (2023-12-17T14:28:28Z) - Making Small Language Models Better Multi-task Learners with
Mixture-of-Task-Adapters [13.6682552098234]
Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks.
We present ALTER, a system that effectively builds the multi-tAsk learners with mixTure-of-task-adaptERs upon small language models.
A two-stage training method is proposed to optimize the collaboration between adapters at a small computational cost.
arXiv Detail & Related papers (2023-09-20T03:39:56Z) - Small Language Models Improve Giants by Rewriting Their Outputs [18.025736098795296]
We tackle the problem of leveraging training data to improve the performance of large language models (LLMs) without fine-tuning.
We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output.
Experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning.
arXiv Detail & Related papers (2023-05-22T22:07:50Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.