Related papers: ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports

ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports

URL: http://arxiv.org/abs/2502.05638v1
Date: Sat, 08 Feb 2025 16:44:56 GMT
Title: ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports
Authors: Aynur Guluzade, Naguib Heiba, Zeyd Boukhers, Florim Hamiti, Jahid Hasan Polash, Yehya Mohamad, Carlos A Velasco,
Abstract summary: This paper presents the results of our project, which aims to leverage Large Language Models (LLMs) to extract structured information from unstructured clinical reports.<n>We developed a workflow with a user interface and evaluated LLMs of varying sizes through prompting strategies and fine-tuning.<n>Our results show that fine-tuned smaller models match or surpass larger counterparts in performance, offering efficiency for resource-limited settings.
Score: 3.0363830583066713
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Europe's healthcare systems require enhanced interoperability and digitalization, driving a demand for innovative solutions to process legacy clinical data. This paper presents the results of our project, which aims to leverage Large Language Models (LLMs) to extract structured information from unstructured clinical reports, focusing on patient history, diagnoses, treatments, and other predefined categories. We developed a workflow with a user interface and evaluated LLMs of varying sizes through prompting strategies and fine-tuning. Our results show that fine-tuned smaller models match or surpass larger counterparts in performance, offering efficiency for resource-limited settings. A new dataset of 60,000 annotated English clinical summaries and 24,000 German translations was validated with automated and manual checks. The evaluations used ROUGE, BERTScore, and entity-level metrics. The work highlights the approach's viability and outlines future improvements.

Related papers

BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text [10.071956824618418]
Large language models (LLMs) hold great promise for medical applications and are evolving rapidly. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text. We present BRIDGE, a comprehensive benchmark comprising 87 tasks sourced from real-world clinical data sources across nine languages.
arXiv Detail & Related papers (2025-04-28T04:13:18Z)
Fine-Tuning LLMs on Small Medical Datasets: Text Classification and Normalization Effectiveness on Cardiology reports and Discharge records [0.07499722271664144]
We investigate the effectiveness of fine-tuning large language models (LLMs) on small medical datasets for text classification and named entity recognition tasks. Our experiments show that fine-tuning improves performance on both tasks, with notable gains observed with as few as 200-300 training examples.
arXiv Detail & Related papers (2025-03-27T10:35:56Z)
Towards Scalable and Cross-Lingual Specialist Language Models for Oncology [4.824906329042275]
General-purpose large models (LLMs) struggle with challenges such as clinical terminology, context-dependent interpretations, and multi-modal data integration. We develop an oncology-specialized, efficient, and adaptable NLP framework that combines instruction tuning, retrieval-augmented generation (RAG), and graph-based knowledge integration.
arXiv Detail & Related papers (2025-03-11T11:34:57Z)
Representation Learning of Structured Data for Medical Foundation Models [29.10129199884847]
We introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records.
arXiv Detail & Related papers (2024-10-17T09:02:28Z)
Named Clinical Entity Recognition Benchmark [2.9332007863461893]
This report introduces a Named Clinical Entity Recognition Benchmark. It addresses the crucial natural language processing (NLP) task of extracting structured information from clinical narratives. The leaderboard provides a standardized platform for assessing diverse language models.
arXiv Detail & Related papers (2024-10-07T14:00:18Z)
Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding [5.279406017862076]
The challenge of summarising a hospital course remains an open area for further research and development. We adapted three pre trained LLMs, Llama 3, BioMistral, Mistral Instruct v0.1 for the hospital course summarisation task. The fine tuned models were evaluated using BERTScore and ROUGE metrics to assess the effectiveness of clinical domain fine tuning.
arXiv Detail & Related papers (2024-09-23T00:35:23Z)
Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. We then construct six novel datasets and clinical tasks that are complex but common in real-world practice. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z)
On Preserving the Knowledge of Long Clinical Texts [0.0]
A bottleneck in using transformer encoders for processing clinical texts comes from the input length limit of these models.<n>This paper proposes a novel method to preserve the knowledge of long clinical texts in the models using aggregated ensembles of transformer encoders.
arXiv Detail & Related papers (2023-11-02T19:50:02Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
VBridge: Connecting the Dots Between Features, Explanations, and Data for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow. We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence. We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z)
Estimating Redundancy in Clinical Text [6.245180523143739]
Clinicians populate new documents by duplicating existing notes, then updating accordingly. quantifying information redundancy can play an essential role in evaluating innovations that operate on clinical narratives. We present and evaluate two strategies to measure redundancy: an information-theoretic approach and a lexicosyntactic and semantic model.
arXiv Detail & Related papers (2021-05-25T11:01:45Z)
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.