R2GenGPT: Radiology Report Generation with Frozen LLMs
- URL: http://arxiv.org/abs/2309.09812v2
- Date: Sun, 5 Nov 2023 07:26:26 GMT
- Title: R2GenGPT: Radiology Report Generation with Frozen LLMs
- Authors: Zhanyu Wang, Lingqiao Liu, Lei Wang and Luping Zhou
- Abstract summary: R2GenGPT is a novel solution that aligns visual features with the word embedding space of LLMs.
R2GenGPT attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module.
Our model only trains 5M parameters to achieve performance close to the SOTA levels.
- Score: 47.72270349660438
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have consistently showcased remarkable
generalization capabilities when applied to various language tasks.
Nonetheless, harnessing the full potential of LLMs for Radiology Report
Generation (R2Gen) still presents a challenge, stemming from the inherent
disparity in modality between LLMs and the R2Gen task. To bridge this gap
effectively, we propose R2GenGPT, which is a novel solution that aligns visual
features with the word embedding space of LLMs using an efficient visual
alignment module. This innovative approach empowers the previously static LLM
to seamlessly integrate and process image information, marking a step forward
in optimizing R2Gen performance. R2GenGPT offers the following benefits. First,
it attains state-of-the-art (SOTA) performance by training only the lightweight
visual alignment module while freezing all the parameters of LLM. Second, it
exhibits high training efficiency, as it requires the training of an
exceptionally minimal number of parameters while achieving rapid convergence.
By employing delta tuning, our model only trains 5M parameters (which
constitute just 0.07\% of the total parameter count) to achieve performance
close to the SOTA levels. Our code is available at
https://github.com/wang-zhanyu/R2GenGPT.
Related papers
- Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs [44.054569398300266]
One-pass Generation and retrieval framework (OneGen)
OneGen bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively.
Results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance.
arXiv Detail & Related papers (2024-09-08T16:35:19Z) - Applying RLAIF for Code Generation with API-usage in Lightweight LLMs [15.366324461797582]
Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains.
This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (1B parameters) LLMs.
arXiv Detail & Related papers (2024-06-28T17:16:03Z) - ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization [13.622268474310918]
ShiftAddLLM is an efficient multiplication-free model for large language models.
It achieves perplexity improvements of 5.6 and 22.7 points at comparable or lower latency.
Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM.
arXiv Detail & Related papers (2024-06-10T02:47:55Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - Full Parameter Fine-tuning for Large Language Models with Limited Resources [55.794732214059806]
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training.
We propose a new computation, LOw-Memory Optimization (LOMO), which fuses the gradient and the parameter update in one step to reduce memory usage.
arXiv Detail & Related papers (2023-06-16T11:37:15Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.