The Role of High-Performance GPU Resources in Large Language Model Based Radiology Imaging Diagnosis
- URL: http://arxiv.org/abs/2509.16328v2
- Date: Wed, 24 Sep 2025 19:08:24 GMT
- Title: The Role of High-Performance GPU Resources in Large Language Model Based Radiology Imaging Diagnosis
- Authors: Jyun-Ping Kao,
- Abstract summary: Large-language models (LLMs) are rapidly being applied to radiology, enabling automated image interpretation and report generation tasks.<n>High-performance graphical processing units (GPUs) provide the necessary compute and memory throughput to run large LLMs on imaging data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-language models (LLMs) are rapidly being applied to radiology, enabling automated image interpretation and report generation tasks. Their deployment in clinical practice requires both high diagnostic accuracy and low inference latency, which in turn demands powerful hardware. High-performance graphical processing units (GPUs) provide the necessary compute and memory throughput to run large LLMs on imaging data. We review modern GPU architectures (e.g. NVIDIA A100/H100, AMD Instinct MI250X/MI300) and key performance metrics of floating-point throughput, memory bandwidth, VRAM capacity. We show how these hardware capabilities affect radiology tasks: for example, generating reports or detecting findings on CheXpert and MIMIC-CXR images is computationally intensive and benefits from GPU parallelism and tensor-core acceleration. Empirical studies indicate that using appropriate GPU resources can reduce inference time and improve throughput. We discuss practical challenges including privacy, deployment, cost, power and optimization strategies: mixed-precision, quantization, compression, and multi-GPU scaling. Finally, we anticipate that next-generation features (8-bit tensor cores, enhanced interconnect) will further enable on-premise and federated radiology AI. Advancing GPU infrastructure is essential for safe, efficient LLM-based radiology diagnostics.
Related papers
- GPU-Accelerated Algorithms for Graph Vector Search: Taxonomy, Empirical Study, and Research Directions [54.570944939061555]
We present a comprehensive study of GPU-accelerated graph-based vector search algorithms.<n>We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units.<n>Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems.
arXiv Detail & Related papers (2026-02-10T16:18:04Z) - GaDE -- GPU-acceleration of time-dependent Dirac Equation for exascale [0.0]
GaDE is designed to simulate the electron dynamics in atoms induced by electromagnetic fields in the relativistic regime.<n>We evaluate GaDE on the pre-exascale supercomputer LUMI, powered by AMD MI250X GPUs and Hewlett-Packard's Slingshot interconnect.
arXiv Detail & Related papers (2025-12-25T14:47:36Z) - Accelerated Digital Twin Learning for Edge AI: A Comparison of FPGA and Mobile GPU [4.116096531149171]
We present a general DT learning framework that is amenable to acceleration on reconfigurable hardware such as FPGAs.<n>We show the usage of this technique in DT guided synthetic data generation for Type 1 Diabetes and proactive coronary artery disease detection.
arXiv Detail & Related papers (2025-12-13T05:51:26Z) - NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding [54.88765757043535]
This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference.<n>Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types with less than 7% computational overhead.<n>The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search.
arXiv Detail & Related papers (2025-05-28T20:43:10Z) - MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Towards Deterministic End-to-end Latency for Medical AI Systems in
NVIDIA Holoscan [0.35516599670943777]
Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform.
concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end-to-end latency.
This paper addresses these challenges within the context of the Holoscan platform, a real-time AI system for streaming sensor data and images.
arXiv Detail & Related papers (2024-02-06T23:20:34Z) - Energy efficiency in Edge TPU vs. embedded GPU for computer-aided
medical imaging segmentation and classification [0.9728436272434581]
We use glaucoma diagnosis based on color fundus images as an example to show the possibility of performing segmentation and classification in real time on embedded boards.
Memory limitations and low processing capabilities of embedded accelerated systems (EAS) limit their use for deep network-based system training.
We evaluate the timing and energy performance of two EAS equipped with Machine Learning (ML) accelerators executing an example diagnostic tool developed in a previous work.
arXiv Detail & Related papers (2023-11-20T09:38:56Z) - Neural Network Methods for Radiation Detectors and Imaging [1.6395318070400589]
Recent advances in machine learning and especially deep neural networks (DNNs) allow for new optimization and performance-enhancement schemes for radiation detectors and imaging hardware.
We give an overview of data generation at photon sources, deep learning-based methods for image processing tasks, and hardware solutions for deep learning acceleration.
arXiv Detail & Related papers (2023-11-09T20:21:51Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense
Prediction [67.11722682878722]
This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention.
Our multi-scale linear attention achieves the global receptive field and multi-scale learning.
EfficientViT delivers remarkable performance gains over previous state-of-the-art models.
arXiv Detail & Related papers (2022-05-29T20:07:23Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Accelerating Deep Learning Applications in Space [0.0]
We investigate the performance of CNN-based object detectors on constrained devices.
We take a closer look at the Single Shot MultiBox Detector (SSD) and Region-based Fully Convolutional Network (R-FCN)
The performance is measured in terms of inference time, memory consumption, and accuracy.
arXiv Detail & Related papers (2020-07-21T21:06:30Z) - Organ Segmentation From Full-size CT Images Using Memory-Efficient FCN [10.411340412305849]
In medical image segmentation tasks, subvolume cropping has become a common preprocessing.
We present a memory-efficient fully convolutional network (FCN) incorporated with several memory-optimized techniques.
arXiv Detail & Related papers (2020-03-24T07:12:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.