Related papers: Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach

URL: http://arxiv.org/abs/2408.07238v1
Date: Tue, 13 Aug 2024 23:59:36 GMT
Title: Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach
Authors: Tong Wang, K. Sudhir, Dat Hong,
Abstract summary: Advanced Large language models (LLMs) provide superior performance in complex human-like interactions. LLMs are costly, or too large for edge devices such as smartphones and harder to self-host, leading to security and privacy concerns. This paper introduces a novel interpretable knowledge distillation approach to enhance the performance of smaller, more economical LLMs.
Score: 6.154304269581415
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Advanced Large language models (LLMs) like GPT-4 or LlaMa 3 provide superior performance in complex human-like interactions. But they are costly, or too large for edge devices such as smartphones and harder to self-host, leading to security and privacy concerns. This paper introduces a novel interpretable knowledge distillation approach to enhance the performance of smaller, more economical LLMs that firms can self-host. We study this problem in the context of building a customer service agent aimed at achieving high customer satisfaction through goal-oriented dialogues. Unlike traditional knowledge distillation, where the "student" model learns directly from the "teacher" model's responses via fine-tuning, our interpretable "strategy" teaching approach involves the teacher providing strategies to improve the student's performance in various scenarios. This method alternates between a "scenario generation" step and a "strategies for improvement" step, creating a customized library of scenarios and optimized strategies for automated prompting. The method requires only black-box access to both student and teacher models; hence it can be used without manipulating model parameters. In our customer service application, the method improves performance, and the learned strategies are transferable to other LLMs and scenarios beyond the training set. The method's interpretabilty helps safeguard against potential harms through human audit.

Related papers

When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making [15.397582422113627]
Embodied decision-making is fundamental for AI agents operating in real-world environments. In this study, we evaluate open-sourced Visual Language Models (VLMs) on multimodal human-centered decision-making tasks.
arXiv Detail & Related papers (2025-03-21T09:25:23Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks [44.45037694899524]
We introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to learn and select the most effective strategies for various reasoning tasks. We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven continuous self-improvement. Our experiments demonstrate that SMART significantly enhances the ability of models to choose optimal strategies without external guidance.
arXiv Detail & Related papers (2024-10-21T15:55:04Z)
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling [29.29604779151457]
This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents. Our method paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
arXiv Detail & Related papers (2024-10-16T11:59:27Z)
Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive Failure [36.83786872708736]
One-to-one tutoring is one of the most efficient methods of teaching. We create a prototype tutor for high school math following Productive Failure (PF), an advanced and effective learning design. We quantitatively show that StratL succeeds in steering the LLM to follow a Productive Failure tutoring strategy.
arXiv Detail & Related papers (2024-10-03T16:15:41Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
Continual Learning for Large Language Models: A Survey [95.79977915131145]
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. This paper surveys recent works on continual learning for LLMs.
arXiv Detail & Related papers (2024-02-02T12:34:09Z)
Large Language Model as a Policy Teacher for Training Reinforcement Learning Agents [16.24662355253529]
Large Language Models (LLMs) can address sequential decision-making tasks through the provision of high-level instructions. LLMs lack specialization in tackling specific target problems, particularly in real-time dynamic environments. We introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent.
arXiv Detail & Related papers (2023-11-22T13:15:42Z)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z)
LgTS: Dynamic Task Sampling using LLM-generated sub-goals for Reinforcement Learning Agents [10.936460061405157]
We propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs. Our approach does not assume access to a propreitary or a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM.
arXiv Detail & Related papers (2023-10-14T00:07:03Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Introspective Tips: Large Language Model for In-Context Decision Making [48.96711664648164]
We employ Introspective Tips" to facilitate large language models (LLMs) in self-optimizing their decision-making. Our method enhances the agent's performance in both few-shot and zero-shot learning situations. Experiments involving over 100 games in TextWorld illustrate the superior performance of our approach.
arXiv Detail & Related papers (2023-05-19T11:20:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.