Related papers: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

URL: http://arxiv.org/abs/2303.16199v2
Date: Wed, 14 Jun 2023 17:31:32 GMT
Title: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Authors: Renrui Zhang, Jiaming Han, Chris Liu, Peng Gao, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Yu Qiao
Abstract summary: LLaMA-Adapter is a method to efficiently fine-tune LLaMA into an instruction-following model. It introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs.
Score: 52.6718081345361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.

Related papers

Adapting LLaMA Decoder to Vision Transformer [65.47663195233802]
This work examines whether decoder-only Transformers such as LLaMA can be adapted to the computer vision field. We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a causal mask to the self-attention brings an attention collapse issue. We develop a soft mask strategy that gradually introduces a causal mask to the self-attention at the onset of training to facilitate the optimization behavior.
arXiv Detail & Related papers (2024-04-10T06:30:08Z)
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction [24.675876324457747]
Existing methods to fine-tune LLMs, like Adapter, Prefix-tuning, and LoRA, may compromise the innate abilities of LLMs. We propose LLaMA-Excitor, a lightweight method that stimulates the LLMs' potential to better follow instructions by gradually paying more attention to worthwhile information. LLaMA-Excitor is the only method that maintains basic capabilities while achieving a significant improvement.
arXiv Detail & Related papers (2024-04-01T04:39:21Z)
LLaMA Pro: Progressive LLaMA with Block Expansion [66.39213657252279]
We propose a new post-pretraining method for Large Language Models (LLMs) with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model from LLaMA2-7B.
arXiv Detail & Related papers (2024-01-04T18:59:12Z)
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models [77.2078051555533]
We propose a novel and affordable solution for the effective VL adaption of large language models (LLMs) Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters. MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions.
arXiv Detail & Related papers (2023-05-24T11:06:15Z)
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model [60.22693761583569]
We present LLaMA-Adapter V2, a parameter-efficient visual instruction model. Specifically, we first augment LLaMA-Adapter by unlocking more learnable parameters. Thirdly, a joint training paradigm of image-text pairs and instruction-following data is introduced.
arXiv Detail & Related papers (2023-04-28T17:59:25Z)
Exploring Efficient-tuning Methods in Self-supervised Speech Models [53.633222197712875]
Self-supervised learning can learn powerful representations for different speech tasks. In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained. We show that the performance parity can be achieved with over 90% parameter reduction.
arXiv Detail & Related papers (2022-10-10T11:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.