Related papers: Distilling LLMs' Decomposition Abilities into Compact Language Models

Distilling LLMs' Decomposition Abilities into Compact Language Models

URL: http://arxiv.org/abs/2402.01812v1
Date: Fri, 2 Feb 2024 13:23:15 GMT
Title: Distilling LLMs' Decomposition Abilities into Compact Language Models
Authors: Denis Tarasov, Kumar Shridhar
Abstract summary: Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities. compact models offer customized training but often fall short in solving complex reasoning tasks.
Score: 12.083499752124649
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM`s capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. The development of an AI-generated dataset and the establishment of baselines constitute the primary contributions of our work, underscoring the potential of compact models in replicating complex problem-solving skills.

Related papers

The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples. We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts. Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z)
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models [60.596005921295806]
Agglomerative models have emerged as a powerful approach to training vision foundation models. We identify critical challenges including resolution mode shifts, teacher imbalance, idiosyncratic teacher artifacts, and an excessive number of output tokens. We propose several novel solutions: multi-resolution training, mosaic augmentation, and improved balancing of teacher loss functions.
arXiv Detail & Related papers (2024-12-10T17:06:41Z)
Unconstrained Model Merging for Enhanced LLM Reasoning [42.079040543428036]
We explore the potential of merging multiple expert models into a single large language model. We propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures. Across 7 benchmarks and 9 reasoning-optimized LLMs, we reveal key findings that reasoning emerges from merging.
arXiv Detail & Related papers (2024-10-17T16:04:07Z)
On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly. In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z)
Configurable Foundation Models: Building LLMs from a Modular Perspective [115.63847606634268]
A growing tendency to decompose LLMs into numerous functional modules allows for inference with part of modules and dynamic assembly of modules to tackle complex tasks. We coin the term brick to represent each functional module, designating the modularized structure as customizable foundation models. We present four brick-oriented operations: retrieval and routing, merging, updating, and growing. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions.
arXiv Detail & Related papers (2024-09-04T17:01:02Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability [12.349247962800813]
Large language models (LLMs) have emerged as powerful tools for many AI problems. They exhibit remarkable in-context learning (ICL) capabilities. How they approach composite tasks remains an open and largely underexplored question.
arXiv Detail & Related papers (2024-07-22T15:22:34Z)
RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models [23.91608718129775]
We propose RKLD, a novel textbfReverse textbfKL-Divergence-based Knowledge textbfDistillation unlearning algorithm for large language models (LLMs) We achieve significant forget quality and effectively maintain the model utility in our experiments.
arXiv Detail & Related papers (2024-06-04T05:51:43Z)
LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English. When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z)
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure [66.33623392497599]
We show that a structure called template-content structure (T-C structure) can reduce the possible space from exponential level to linear level. We demonstrate that models can achieve task composition, further reducing the space needed to learn from linear to logarithmic.
arXiv Detail & Related papers (2023-10-09T06:57:45Z)
Concept-aware Training Improves In-context Learning Ability of Language Models [0.0]
Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability. We propose a method to create LMs able to better utilize the in-context information. We measure that data sampling of Concept-aware Training consistently improves models' reasoning ability.
arXiv Detail & Related papers (2023-05-23T07:44:52Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.