A Method for the Architecture of a Medical Vertical Large Language Model Based on Deepseek R1
- URL: http://arxiv.org/abs/2505.00025v2
- Date: Tue, 22 Jul 2025 14:26:53 GMT
- Title: A Method for the Architecture of a Medical Vertical Large Language Model Based on Deepseek R1
- Authors: Mingda Zhang, Jianglong Qin,
- Abstract summary: This paper presents an efficient lightweight medical large language model architecture that addresses knowledge acquisition, model compression, and computational enhancement challenges.<n>We design a knowledge transfer pipeline from DeepSeek-R1-Distill-70B to DeepSeek-R1-Distill-7B using Low-Rank Adaptation (LoRA) for precise medical knowledge retention.<n>Our approach maintains 92.1% accuracy on USMLE while reducing memory consumption by 64.7% and latency by 12.4% compared to baseline inference models.
- Score: 6.589206192038366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant advances in foundation models like DeepSeek-R1 and ChatGPT, their deployment in medical settings faces critical challenges including computational requirements and professional knowledge barriers. This paper presents an efficient lightweight medical large language model architecture that systematically addresses these challenges through three-dimensional optimization: knowledge acquisition, model compression, and computational enhancement. We design a knowledge transfer pipeline from DeepSeek-R1-Distill-70B to DeepSeek-R1-Distill-7B using Low-Rank Adaptation (LoRA) for precise medical knowledge retention. Through 4-bit quantization and mixed-precision strategies, we achieve substantial model compression while preserving medical reasoning capabilities. The inference framework incorporates Flash Attention acceleration and continuous batching, complemented by specialized prompt templates for diverse medical queries. Experimental evaluation on medical benchmarks demonstrates that our approach maintains 92.1% accuracy on USMLE examinations while reducing memory consumption by 64.7% and inference latency by 12.4% compared to baseline models. This work provides a practical solution for deploying advanced language models in resource-constrained medical environments, enabling broader accessibility of AI-assisted healthcare.
Related papers
- MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z) - Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training [0.0]
We present Gazal-R1, a 32-billion- parameter language model that achieves state-of-the-art performance in medical reasoning.<n>Our model demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized domains.<n>Gazal-R1 achieves exceptional performance across medical benchmarks, scoring 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA.
arXiv Detail & Related papers (2025-06-18T09:44:21Z) - Lightweight Clinical Decision Support System using QLoRA-Fine-Tuned LLMs and Retrieval-Augmented Generation [0.0]
This research paper investigates the application of Large Language Models (LLMs) in healthcare.<n>We focus on enhancing medical decision support through Retrieval-Augmented Generation (RAG) integrated with hospital-specific data and fine-tuning using Quantized Low-Rank Adaptation (QLoRA)<n>We touch on the ethical considerations-patient privacy, data security, and the need for rigorous clinical validation-as well as the practical challenges of integrating such systems into real-world healthcare.
arXiv Detail & Related papers (2025-05-06T10:31:54Z) - EMRModel: A Large Language Model for Extracting Medical Consultation Dialogues into Structured Medical Records [11.013242961199204]
We propose EMRModel, a novel approach that integrates LoRA-based fine-tuning with code-style prompt design.<n>We construct a high-quality, realistically grounded dataset of medical consultation dialogues with detailed annotations.<n> Experimental results show EMRModel achieves an F1 score of 88.1%, improving by49.5% over standard pre-trained models.
arXiv Detail & Related papers (2025-04-23T06:17:55Z) - Knowledge Distillation: Enhancing Neural Network Compression with Integrated Gradients [0.0]
This paper proposes a machine learning framework that augments Knowledge Distillation (KD) with Integrated Gradients (IG)<n>We introduce a novel data augmentation strategy where IG maps, precomputed from a teacher model, are overlaid onto training images to guide a compact student model toward critical feature representations.<n>Experiments on CIFAR-10 demonstrate the efficacy of our method: a student model, compressed 4.1-fold from the MobileNet-V2 teacher, achieves 92.5% classification accuracy, surpassing the baseline student's 91.4% and traditional KD approaches, while reducing inference latency from 140 ms to 13 ms--a tenfold
arXiv Detail & Related papers (2025-03-17T10:07:50Z) - Pathology Image Compression with Pre-trained Autoencoders [52.208181380986524]
Whole Slide Images in digital histopathology pose significant storage, transmission, and computational efficiency challenges.
Standard compression methods, such as JPEG, reduce file sizes but fail to preserve fine-grained phenotypic details critical for downstream tasks.
In this work, we repurpose autoencoders (AEs) designed for Latent Diffusion Models as an efficient learned compression framework for pathology images.
arXiv Detail & Related papers (2025-03-14T17:01:17Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models.
FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z) - Structured Model Pruning for Efficient Inference in Computational Pathology [2.9687381456164004]
We develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging.
We empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance.
arXiv Detail & Related papers (2024-04-12T22:05:01Z) - Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks [17.40940406100025]
We introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters.
Our systems achieved remarkable accuracy across six medical benchmarks.
Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8.
arXiv Detail & Related papers (2024-03-30T14:09:00Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case
Study in Medicine [89.46836590149883]
We build on a prior study of GPT-4's capabilities on medical challenge benchmarks in the absence of special training.
We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks.
With Medprompt, GPT-4 achieves state-of-the-art results on all nine of the benchmark datasets in the MultiMedQA suite.
arXiv Detail & Related papers (2023-11-28T03:16:12Z) - Sculpting Efficiency: Pruning Medical Imaging Models for On-Device
Inference [13.403419873964422]
We highlight the excess operational complexity in a suboptimally configured ML model from prior work.
Our results show a compression rate of 1148x with minimal loss in quality.
We consider avenues for future research in streamlining for clinical researchers to develop models quicker and better suited for real-world use.
arXiv Detail & Related papers (2023-09-10T17:34:14Z) - An Evaluation of Lightweight Deep Learning Techniques in Medical Imaging
for High Precision COVID-19 Diagnostics [0.0]
Decision support systems relax the challenges inherent to the physical examination of images.
Most deep learning algorithms utilised approaches are not amenable to implementation on resource-constrained devices.
This paper presents the development and evaluation of the performance of lightweight deep learning technique for the detection of COVID-19 using the MobileNetV2 model.
arXiv Detail & Related papers (2023-05-30T13:14:03Z) - Design Automation for Fast, Lightweight, and Effective Deep Learning
Models: A Survey [53.258091735278875]
This survey covers studies of design automation techniques for deep learning models targeting edge computing.
It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs.
The survey proceeds to cover three categories of the state-of-the-art of deep model design automation techniques.
arXiv Detail & Related papers (2022-08-22T12:12:43Z) - Robust and Efficient Medical Imaging with Self-Supervision [80.62711706785834]
We present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI.
We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data.
arXiv Detail & Related papers (2022-05-19T17:34:18Z) - Ultrasound Signal Processing: From Models to Deep Learning [64.56774869055826]
Medical ultrasound imaging relies heavily on high-quality signal processing to provide reliable and interpretable image reconstructions.
Deep learning based methods, which are optimized in a data-driven fashion, have gained popularity.
A relatively new paradigm combines the power of the two: leveraging data-driven deep learning, as well as exploiting domain knowledge.
arXiv Detail & Related papers (2022-04-09T13:04:36Z) - SSD-KD: A Self-supervised Diverse Knowledge Distillation Method for
Lightweight Skin Lesion Classification Using Dermoscopic Images [62.60956024215873]
Skin cancer is one of the most common types of malignancy, affecting a large population and causing a heavy economic burden worldwide.
Most studies in skin cancer detection keep pursuing high prediction accuracies without considering the limitation of computing resources on portable devices.
This study specifically proposes a novel method, termed SSD-KD, that unifies diverse knowledge into a generic KD framework for skin diseases classification.
arXiv Detail & Related papers (2022-03-22T06:54:29Z) - Demystifying Deep Learning Models for Retinal OCT Disease Classification
using Explainable AI [0.6117371161379209]
The adoption of various deep learning techniques is quite common as well as effective, and its statement is equally true when it comes to implementing it into the retina Optical Coherence Tomography sector.
These techniques have the black box characteristics that prevent the medical professionals to completely trust the results generated from them.
This paper proposes a self-developed CNN model which is comparatively smaller and simpler along with the use of Lime that introduces Explainable AI to the study.
arXiv Detail & Related papers (2021-11-06T13:54:07Z) - Knowledge Distillation: A Survey [87.51063304509067]
Deep neural networks have been successful in both industry and academia, especially for computer vision tasks.
It is a challenge to deploy these cumbersome deep models on devices with limited resources.
Knowledge distillation effectively learns a small student model from a large teacher model.
arXiv Detail & Related papers (2020-06-09T21:47:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.