Distilling LLMs' Decomposition Abilities into Compact Language Models
- URL: http://arxiv.org/abs/2402.01812v1
- Date: Fri, 2 Feb 2024 13:23:15 GMT
- Title: Distilling LLMs' Decomposition Abilities into Compact Language Models
- Authors: Denis Tarasov, Kumar Shridhar
- Abstract summary: Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities.
compact models offer customized training but often fall short in solving complex reasoning tasks.
- Score: 12.083499752124649
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have demonstrated proficiency in their reasoning
abilities, yet their large size presents scalability challenges and limits any
further customization. In contrast, compact models offer customized training
but often fall short in solving complex reasoning tasks. This study focuses on
distilling the LLMs' decomposition skills into compact models using offline
reinforcement learning. We leverage the advancements in the LLM`s capabilities
to provide feedback and generate a specialized task-specific dataset for
training compact models. The development of an AI-generated dataset and the
establishment of baselines constitute the primary contributions of our work,
underscoring the potential of compact models in replicating complex
problem-solving skills.
Related papers
- Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability [12.349247962800813]
Large language models (LLMs) have emerged as powerful tools for many AI problems.
They exhibit remarkable in-context learning (ICL) capabilities.
How they approach composite tasks is an open question and largely ununderstood.
arXiv Detail & Related papers (2024-07-22T15:22:34Z) - Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We investigate the interplay between generalization and memorization in large language models at scale.
With various sizes of open-source LLMs and their pretraining corpora, we observe that as the model size increases, the task-relevant $n$-gram pair data becomes increasingly important.
Our results support the hypothesis that LLMs' capabilities emerge from a delicate balance of memorization and generalization with sufficient task-related pretraining data.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models [23.91608718129775]
We propose RKLD, a novel textbfReverse textbfKL-Divergence-based Knowledge textbfDistillation unlearning algorithm for large language models (LLMs)
We achieve significant forget quality and effectively maintain the model utility in our experiments.
arXiv Detail & Related papers (2024-06-04T05:51:43Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level.
We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Concept-aware Training Improves In-context Learning Ability of Language
Models [0.0]
Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability.
We propose a method to create LMs able to better utilize the in-context information.
We measure that data sampling of Concept-aware Training consistently improves models' reasoning ability.
arXiv Detail & Related papers (2023-05-23T07:44:52Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.