Label Supervised LLaMA Finetuning
- URL: http://arxiv.org/abs/2310.01208v1
- Date: Mon, 2 Oct 2023 13:53:03 GMT
- Title: Label Supervised LLaMA Finetuning
- Authors: Zongxi Li, Xianming Li, Yuzhang Liu, Haoran Xie, Jing Li, Fu-lee Wang,
Qing Li, Xiaoqin Zhong
- Abstract summary: In this paper, we introduce a label-supervised adaptation for Large Language Models (LLMs)
We extract latent representations from the final LLaMA layer and project them into the label space to compute the cross-entropy loss.
Remarkably, without intricate prompt engineering or external knowledge, LS-LLaMA substantially outperforms LLMs ten times its size in scale.
- Score: 13.939718306233617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent success of Large Language Models (LLMs) has gained significant
attention in both academia and industry. Substantial efforts have been made to
enhance the zero- and few-shot generalization capabilities of open-source LLMs
through finetuning. Currently, the prevailing approach is instruction-tuning,
which trains LLMs to complete real-world tasks by generating responses guided
by natural language instructions. It is worth noticing that such an approach
may underperform in sequence and token classification tasks. Unlike text
generation tasks, classification tasks have a limited label space, where
precise label prediction is more appreciated than generating diverse and
human-like responses. Prior research has unveiled that instruction-tuned LLMs
cannot outperform BERT, prompting us to explore the potential of leveraging
latent representations from LLMs for supervised label prediction. In this
paper, we introduce a label-supervised adaptation for LLMs, which aims to
finetuning the model with discriminant labels. We evaluate this approach with
Label Supervised LLaMA (LS-LLaMA), based on LLaMA-2-7B, a relatively
small-scale LLM, and can be finetuned on a single GeForce RTX4090 GPU. We
extract latent representations from the final LLaMA layer and project them into
the label space to compute the cross-entropy loss. The model is finetuned by
Low-Rank Adaptation (LoRA) to minimize this loss. Remarkably, without intricate
prompt engineering or external knowledge, LS-LLaMA substantially outperforms
LLMs ten times its size in scale and demonstrates consistent improvements
compared to robust baselines like BERT-Large and RoBERTa-Large in text
classification. Moreover, by removing the causal mask from decoders, LS-unLLaMA
achieves the state-of-the-art performance in named entity recognition (NER).
Our work will shed light on a novel approach to adapting LLMs for various
downstream tasks.
Related papers
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization [12.885866125783618]
Large Language Models (LLMs) tend to produce inaccurate responses to specific queries.
We construct an adversarial dataset, named as $textbfADT (Adrial dataset for Tokenizer)$ to challenge LLMs' tokenization.
Our empirical results reveal that our ADT is highly effective on challenging the tokenization of leading LLMs, including GPT-4o, Llama-3, Qwen2.5-max and so on.
arXiv Detail & Related papers (2024-05-27T11:39:59Z) - An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs [54.91212829143966]
This study explores LLaMA3's capabilities when quantized to low bit-width.
We evaluate 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets.
Our experimental results indicate that LLaMA3 still suffers non-negligent degradation in linguistic and visual contexts.
arXiv Detail & Related papers (2024-04-22T10:03:03Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction [24.675876324457747]
Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, may compromise the innate abilities of LLMs.
We propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information.
LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement.
arXiv Detail & Related papers (2024-04-01T04:39:21Z) - Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling [0.0]
Recent decoder-only large language models (LLMs) perform on par with smaller state-based encoders.
We explore techniques for improving the SL performance of open LLMs on IE tasks by applying layer-wise removal of the causal mask.
Our findings hold for diverse SL tasks, demonstrating that open LLMs with layer-dependent CM removal outperform strong-based encoders and even instruction-tuned LLMs.
arXiv Detail & Related papers (2024-01-25T22:50:48Z) - Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning [23.932500424117244]
In-Context Learning (ICL) is an emergent capability of Large Language Models (LLMs)
Previous studies have shown that using LLMs' outputs as labels is effective in training models to select demonstrations.
This paper presents an analysis on different utility functions by focusing on LLMs' output probability given ground-truth output.
arXiv Detail & Related papers (2023-11-16T07:03:54Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.