Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher
- URL: http://arxiv.org/abs/2406.18002v2
- Date: Thu, 03 Oct 2024 07:55:17 GMT
- Title: Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher
- Authors: Hyunjong Ok, Jegwang Ryu, Jaeho Lee,
- Abstract summary: How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality?
We develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens.
We demonstrate that our method provides a consistent improvement over conventional decoding strategies.
- Score: 11.136112399898481
- License:
- Abstract: How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality? This question has been well studied in scenarios where there is no restriction on the number of LLM supervisions one can use, giving birth to many decoding algorithms that utilize supervision without further training. However, it is still unclear what is an effective strategy under the $\textit{limited supervision}$ scenario, where we assume that no more than a few tokens can be generated by LLMs. To this end, we develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens so that the generated tokens can more accurately condition the subsequent token generation by small-scale LLM only. Critically, we find that it is essential to adaptively overtrust or disregard the LLM prediction based on the confidence of the small-scale LLM. Through our experiments on a wide range of models and datasets, we demonstrate that our method provides a consistent improvement over conventional decoding strategies. $\small$ $\textbf{Code:}$ https://github.com/HJ-Ok/DecLimSup
Related papers
- A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost.
This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z) - WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents [55.64361927346957]
We propose a neurosymbolic approach to learn rules gradient-free through large language models (LLMs)
Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC)
On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods.
arXiv Detail & Related papers (2024-10-09T23:37:36Z) - zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning.
We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z) - Enhancing Discriminative Tasks by Guiding the Pre-trained Language Model with Large Language Model's Experience [4.814313782484443]
Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks.
We use LLMs to generate domain-specific data, thereby improving the performance of pre-trained LMs on the target tasks.
arXiv Detail & Related papers (2024-08-16T06:37:59Z) - Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings [17.109522466982476]
Large Language Models (LLMs) have been shown to perform well for many downstream tasks.
This paper investigates how well LLMs perform in the task of forecasting corporate credit ratings.
arXiv Detail & Related papers (2024-07-24T20:30:55Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling [0.0]
Recent decoder-only large language models (LLMs) perform on par with smaller state-based encoders.
We explore techniques for improving the SL performance of open LLMs on IE tasks by applying layer-wise removal of the causal mask.
Our findings hold for diverse SL tasks, demonstrating that open LLMs with layer-dependent CM removal outperform strong-based encoders and even instruction-tuned LLMs.
arXiv Detail & Related papers (2024-01-25T22:50:48Z) - Label Supervised LLaMA Finetuning [13.939718306233617]
In this paper, we introduce a label-supervised adaptation for Large Language Models (LLMs)
We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss.
Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale.
arXiv Detail & Related papers (2023-10-02T13:53:03Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.