Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks
- URL: http://arxiv.org/abs/2506.20009v1
- Date: Tue, 24 Jun 2025 20:56:03 GMT
- Title: Accurate and Energy Efficient: Local Retrieval-Augmented Generation Models Outperform Commercial Large Language Models in Medical Tasks
- Authors: Konstantinos Vrettos, Michail E. Klontzas,
- Abstract summary: Local Large Language Models (LLMs) can be leveraged to develop RAGs that outperform commercial, online LLMs in medical tasks.<n>Our modular framework promotes sustainable AI development, reducing electricity usage and aligning with the UNs Sustainable Development Goals.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Background The increasing adoption of Artificial Intelligence (AI) in healthcare has sparked growing concerns about its environmental and ethical implications. Commercial Large Language Models (LLMs), such as ChatGPT and DeepSeek, require substantial resources, while the utilization of these systems for medical purposes raises critical issues regarding patient privacy and safety. Methods We developed a customizable Retrieval-Augmented Generation (RAG) framework for medical tasks, which monitors its energy usage and CO2 emissions. This system was then used to create RAGs based on various open-source LLMs. The tested models included both general purpose models like llama3.1:8b and medgemma-4b-it, which is medical-domain specific. The best RAGs performance and energy consumption was compared to DeepSeekV3-R1 and OpenAIs o4-mini model. A dataset of medical questions was used for the evaluation. Results Custom RAG models outperformed commercial models in accuracy and energy consumption. The RAG model built on llama3.1:8B achieved the highest accuracy (58.5%) and was significantly better than other models, including o4-mini and DeepSeekV3-R1. The llama3.1-RAG also exhibited the lowest energy consumption and CO2 footprint among all models, with a Performance per kWh of 0.52 and a total CO2 emission of 473g. Compared to o4-mini, the llama3.1-RAG achieved 2.7x times more accuracy points per kWh and 172% less electricity usage while maintaining higher accuracy. Conclusion Our study demonstrates that local LLMs can be leveraged to develop RAGs that outperform commercial, online LLMs in medical tasks, while having a smaller environmental impact. Our modular framework promotes sustainable AI development, reducing electricity usage and aligning with the UNs Sustainable Development Goals.
Related papers
- OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets [0.0]
We introduce OpenMed NER, a suite of open-source, domain-adapted transformer models for named-entity recognition.<n>We evaluate our models on 12 established biomedical NER benchmarks spanning chemicals, diseases, genes, and species.<n>OpenMed NER achieves new state-of-the-art micro-F1 scores on 10 of these 12 datasets, with substantial gains across diverse entity types.
arXiv Detail & Related papers (2025-08-03T07:33:28Z) - Smart Buildings Energy Consumption Forecasting using Adaptive Evolutionary Ensemble Learning Models [23.650796013805937]
This paper proposes three hybrid ensemble predictive models, incorporating Bagging, Stacking, and Voting mechanisms combined with a fast and effective evolutionary hyper- parameters tuner.<n>The performance of the proposed energy forecasting model was evaluated using a hybrid dataset comprising meteorological parameters, appliance energy use, temperature, humidity, and lighting energy consumption from various sections of a building, collected by 18 sensors located in Stambroek, Mons, Belgium.<n>The prediction results indicate that the adaptive evolutionary bagging model surpassed other predictive models in both accuracy and learning error.
arXiv Detail & Related papers (2025-06-13T15:13:15Z) - EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z) - How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference [0.0]
This paper introduces a novel infrastructure-aware benchmarking framework for quantifying the environmental footprint of AI inference across 30 state-of-the-art models as deployed in commercial data centers.<n>Our results show that o3 and DeepSeek-R1 emerge as the most energy-intensive models, consuming over 33 Wh per long prompt, more than 70 times the consumption of GPT-4.1 nano, and that Claude-3.7 Sonnet ranks highest in eco-efficiency.<n>These findings illustrate a growing paradox: Although AI is becoming cheaper and faster, its global adoption drives disproportionate resource consumption.
arXiv Detail & Related papers (2025-05-14T17:47:00Z) - Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations [2.2765705959685234]
This study investigates the energy consumption of Discriminative and Generative AI models within real-world MLOps pipelines.<n>We employ software-based power measurements to ensure ease of replication across diverse configurations, models, and datasets.
arXiv Detail & Related papers (2025-03-31T10:28:04Z) - Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs [96.68469559192846]
We present two differently sized MoE large language models (LLMs)<n>Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters.<n>We propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency.
arXiv Detail & Related papers (2025-03-07T04:43:39Z) - Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation [53.2981100111204]
Photoplethysmography and electrocardiography can potentially enable continuous blood pressure (BP) monitoring.<n>Yet accurate and robust machine learning (ML) models remains challenging due to variability in data quality and patient-specific factors.<n>In this work, we investigate whether a model pre-trained on one modality can effectively be exploited to improve the accuracy of a different signal type.<n>Our approach achieves near state-of-the-art accuracy for diastolic BP and surpasses by 1.5x the accuracy of prior works for systolic BP.
arXiv Detail & Related papers (2025-02-10T13:33:12Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - REST: Robust and Efficient Neural Networks for Sleep Monitoring in the
Wild [62.36144064259933]
We propose REST, a new method that simultaneously tackles both issues via adversarial training and controlling the Lipschitz constant of the neural network.
We demonstrate that REST produces highly-robust and efficient models that substantially outperform the original full-sized models in the presence of noise.
By deploying these models to an Android application on a smartphone, we quantitatively observe that REST allows models to achieve up to 17x energy reduction and 9x faster inference.
arXiv Detail & Related papers (2020-01-29T17:23:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.