Related papers: Balancing Natural Language Processing Accuracy and Normalisation in Extracting Medical Insights

Balancing Natural Language Processing Accuracy and Normalisation in Extracting Medical Insights

URL: http://arxiv.org/abs/2511.15778v1
Date: Wed, 19 Nov 2025 18:51:45 GMT
Title: Balancing Natural Language Processing Accuracy and Normalisation in Extracting Medical Insights
Authors: Paulina Tworek, Miłosz Bargieł, Yousef Khan, Tomasz Pełech-Pilichowski, Marek Mikołajczyk, Roman Lewandowski, Jose Sousa,
Abstract summary: This study presents a comparative analysis of NLP low-compute rule-based methods and Large Language Models (LLMs) for information extraction from electronic health records.<n>We evaluate both approaches by extracting patient demographics, clinical findings, and prescribed medications while examining the effects of lack of text normalisation and translation-induced information loss.<n>Results demonstrate that rule-based methods provide higher accuracy in information retrieval tasks, particularly for age and sex extraction.<n>LLMs offer greater adaptability and scalability, excelling in drug name recognition.
Score: 2.654416335526196
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extracting structured medical insights from unstructured clinical text using Natural Language Processing (NLP) remains an open challenge in healthcare, particularly in non-English contexts where resources are scarce. This study presents a comparative analysis of NLP low-compute rule-based methods and Large Language Models (LLMs) for information extraction from electronic health records (EHR) obtained from the Voivodeship Rehabilitation Hospital for Children in Ameryka, Poland. We evaluate both approaches by extracting patient demographics, clinical findings, and prescribed medications while examining the effects of lack of text normalisation and translation-induced information loss. Results demonstrate that rule-based methods provide higher accuracy in information retrieval tasks, particularly for age and sex extraction. However, LLMs offer greater adaptability and scalability, excelling in drug name recognition. The effectiveness of the LLMs was compared with texts originally in Polish and those translated into English, assessing the impact of translation. These findings highlight the trade-offs between accuracy, normalisation, and computational cost when deploying NLP in healthcare settings. We argue for hybrid approaches that combine the precision of rule-based systems with the adaptability of LLMs, offering a practical path toward more reliable and resource-efficient clinical NLP in real-world hospitals.

Related papers

A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z)
Fine-grained Alignment of Large Language Models for General Medication Recommendation without Overprescription [45.41664696802343]
Large language models (LLMs) hold significant promise in achieving general medication recommendation systems.<n>We introduce Language-Assisted Medication Recommendation, which tailors LLMs for medication recommendation in a medication-aware manner.<n>Fine-tuning LLMs with this framework can outperform existing methods by more than 10% in internal validation and generalize across temporal and external validations.
arXiv Detail & Related papers (2025-03-05T17:28:16Z)
Bridging Language Barriers in Healthcare: A Study on Arabic LLMs [1.2006896500048552]
This paper investigates the challenges of developing large language models proficient in both multilingual understanding and medical knowledge.<n>We find that larger models with carefully calibrated language ratios achieve superior performance on native-language clinical tasks.
arXiv Detail & Related papers (2025-01-16T20:24:56Z)
Evaluating Computational Accuracy of Large Language Models in Numerical Reasoning Tasks for Healthcare Applications [0.0]
Large Language Models (LLMs) have emerged as transformative tools in the healthcare sector.<n>Their proficiency in numerical reasoning, particularly in high-stakes domains like in clinical applications, remains underexplored.<n>This study investigates the computational accuracy of LLMs in numerical reasoning tasks within healthcare contexts.
arXiv Detail & Related papers (2025-01-14T04:29:43Z)
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z)
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries [56.31117605097345]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.<n>Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.<n>AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z)
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence [46.71469172542448]
This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts. It consists of 345 plain language summaries of abstracts generated from three randomized controlled trials (RCTs) We assess the factuality of critical elements of RCTs in those summaries, as well as the reported findings concerning these.
arXiv Detail & Related papers (2024-02-18T04:45:01Z)
Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models [29.05425041393475]
Generative Large Language Models (LLMs) hold significant promise in healthcare. This study assessed the potential of LLMs to function as autonomous agents in a simulated tertiary care medical center.
arXiv Detail & Related papers (2024-01-05T15:09:57Z)
LLMs Accelerate Annotation for Medical Information Extraction [7.743388571513413]
We propose an approach that combines Large Language Models (LLMs) with human expertise to create an efficient method for generating ground truth labels for medical text annotation. We rigorously evaluate our method on a medical information extraction task, demonstrating that our approach not only substantially cuts down on human intervention but also maintains high accuracy.
arXiv Detail & Related papers (2023-12-04T19:26:13Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.