Label Supervised LLaMA Finetuning
- URL: http://arxiv.org/abs/2310.01208v1
- Date: Mon, 2 Oct 2023 13:53:03 GMT
- Title: Label Supervised LLaMA Finetuning
- Authors: Zongxi Li, Xianming Li, Yuzhang Liu, Haoran Xie, Jing Li, Fu-lee Wang,
Qing Li, Xiaoqin Zhong
- Abstract summary: In this paper, we introduce a label-supervised adaptation for Large Language Models (LLMs)
We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss.
Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale.
- Score: 13.939718306233617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success of Large Language Models (LLMs) has gained significant
attention in both academia and industry. Substantial efforts have been made to
enhance the zero- and few-shot generalization capabilities of open-source LLMs
through finetuning. Currently, the prevailing approach is instruction-tuning,
which trains LLMs to complete real-world tasks by generating responses guided
by natural language instructions. It is worth noticing that such an approach
may underperform in sequence and token classification tasks. Unlike text
generation tasks, classification tasks have a limited label space, where
precise label prediction is more appreciated than generating diverse and
human-like responses. Prior research has unveiled that instruction-tuned LLMs
cannot outperform BERT, prompting us to explore the potential of leveraging
latent representations from LLMs for supervised label prediction. In this
paper, we introduce a label-supervised adaptation for LLMs, which aims to
finetuning the model with discriminant labels. We evaluate this approach with
Label Supervised LLaMA (LS-LLaMA), based on LLaMA-2-7B, a relatively
small-scale LLM, and can be finetuned on a single GeForce RTX4090 GPU. We
extract latent representations from the final LLaMA layer and project them into
the label space to compute the cross-entropy loss. The model is finetuned by
Low-Rank Adaptation (LoRA) to minimize this loss. Remarkably, without intricate
prompt engineering or external knowledge, LS-LLaMA substantially outperforms
LLMs ten times its size in scale and demonstrates consistent improvements
compared to robust baselines like BERT-Large and RoBERTa-Large in text
classification. Moreover, by removing the causal mask from decoders, LS-unLLaMA
achieves the state-of-the-art performance in named entity recognition (NER).
Our work will shed light on a novel approach to adapting LLMs for various
downstream tasks.
Related papers
- WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents [55.64361927346957]
We propose a neurosymbolic approach to learn rules gradient-free through large language models (LLMs)
Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC)
On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods.
arXiv Detail & Related papers (2024-10-09T23:37:36Z) - Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels [75.77877889764073]
Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels.
This study explores whether solely utilizing unlabeled data can elicit strong model capabilities.
We propose a new paradigm termed zero-to-strong generalization.
arXiv Detail & Related papers (2024-09-19T02:59:44Z) - Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher [11.136112399898481]
How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality?
We develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens.
We demonstrate that our method provides a consistent improvement over conventional decoding strategies.
arXiv Detail & Related papers (2024-06-26T01:16:12Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization [12.885866125783618]
Large Language Models (LLMs) tend to produce inaccurate responses to specific queries.
We construct an adversarial dataset, named as $textbfADT (Adrial dataset for Tokenizer)$ to challenge LLMs' tokenization.
Our empirical results reveal that our ADT is highly effective on challenging the tokenization of leading LLMs, including GPT-4o, Llama-3, Qwen2.5-max and so on.
arXiv Detail & Related papers (2024-05-27T11:39:59Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction [24.675876324457747]
Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, may compromise the innate abilities of LLMs.
We propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information.
LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement.
arXiv Detail & Related papers (2024-04-01T04:39:21Z) - Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling [0.0]
Recent decoder-only large language models (LLMs) perform on par with smaller state-based encoders.
We explore techniques for improving the SL performance of open LLMs on IE tasks by applying layer-wise removal of the causal mask.
Our findings hold for diverse SL tasks, demonstrating that open LLMs with layer-dependent CM removal outperform strong-based encoders and even instruction-tuned LLMs.
arXiv Detail & Related papers (2024-01-25T22:50:48Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.