MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
- URL: http://arxiv.org/abs/2402.01620v2
- Date: Fri, 7 Jun 2024 18:13:16 GMT
- Title: MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
- Authors: Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal,
- Abstract summary: We introduce MAGDi, a new method for structured distillation of the reasoning interactions between multiple Large Language Model (LLM) agents into smaller LMs.
Experiments on seven widely used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models.
We conduct extensive analyses to show that MAGDi enhances the generalizability to out-of-domain tasks, scales positively with the size and strength of the base student model, and obtains larger improvements when applying self-consistency.
- Score: 61.479419734006825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-agent interactions between Large Language Model (LLM) agents have shown major improvements on diverse reasoning tasks. However, these involve long generations from multiple models across several rounds, making them expensive. Moreover, these multi-agent approaches fail to provide a final, single model for efficient inference. To address this, we introduce MAGDi, a new method for structured distillation of the reasoning interactions between multiple LLMs into smaller LMs. MAGDi teaches smaller models by representing multi-agent interactions as graphs, augmenting a base student model with a graph encoder, and distilling knowledge using three objective functions: next-token prediction, a contrastive loss between correct and incorrect reasoning, and a graph-based objective to model the interaction structure. Experiments on seven widely used commonsense and math reasoning benchmarks show that MAGDi improves the reasoning capabilities of smaller models, outperforming several methods that distill from a single teacher and multiple teachers. Moreover, MAGDi also demonstrates an order of magnitude higher efficiency over its teachers. We conduct extensive analyses to show that MAGDi (1) enhances the generalizability to out-of-domain tasks, (2) scales positively with the size and strength of the base student model, and (3) obtains larger improvements (via our multi-teacher training) when applying self-consistency -- an inference technique that relies on model diversity.
Related papers
- Brainstorming Brings Power to Large Language Models of Knowledge Reasoning [17.14501985068287]
Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning.
Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collaboration.
We propose the multi-model brainstorming based on prompt. It incorporates different models into a group for brainstorming, and after multiple rounds of reasoning elaboration and re-inference, a consensus answer is reached.
arXiv Detail & Related papers (2024-06-02T14:47:14Z) - Unlock the Power: Competitive Distillation for Multi-Modal Large
Language Models [17.25135606956287]
Competitive Multi-modal Distillation framework (CoMD) captures bidirectional feedback between teacher and student models.
Our experimental analysis of diverse datasets shows that our knowledge transfer method consistently improves the capabilities of the student model.
arXiv Detail & Related papers (2023-11-14T14:49:46Z) - Teaching Language Models to Self-Improve through Interactive Demonstrations [83.9421355808174]
Self-improving ability of large language models has been shown to be absent and difficult to learn for smaller models.
We introduce TriPosT, a training algorithm that endows smaller models with such self-improvement ability.
We show that our approach can improve a LLaMA-7b's performance on math and reasoning tasks by up to 7.13%.
arXiv Detail & Related papers (2023-10-20T14:11:04Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Module-wise Adaptive Distillation for Multimodality Foundation Models [125.42414892566843]
multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes.
One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer.
Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently.
arXiv Detail & Related papers (2023-10-06T19:24:00Z) - MinT: Boosting Generalization in Mathematical Reasoning via Multi-View
Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs)
We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles.
Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning [40.60328470622483]
We propose a novel model entitled MMKGR (Multi-hop Multi-modal Knowledge Graph Reasoning)
The model contains the following two components: (1) a unified gate-attention network which is designed to generate effective multi-modal complementary features through sufficient attention interaction and noise reduction; and (2) a complementary feature-aware reinforcement learning method which is proposed to predict missing elements by performing the multi-hop reasoning process.
The experimental results demonstrate that MMKGR outperforms the state-of-the-art approaches in the MKG reasoning task.
arXiv Detail & Related papers (2022-09-03T13:07:02Z) - Online Knowledge Distillation via Multi-branch Diversity Enhancement [15.523646047674717]
We propose a new distillation method to enhance the diversity among multiple student models.
We use Feature Fusion Module (FFM), which improves the performance of the attention mechanism in the network.
We also use Diversification(CD) loss function to strengthen the differences between the student models.
arXiv Detail & Related papers (2020-10-02T05:52:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.