Related papers: Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case

Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case

URL: http://arxiv.org/abs/2507.03067v1
Date: Thu, 03 Jul 2025 17:32:57 GMT
Title: Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case
Authors: Alvaro Riquelme, Pedro Costa, Catalina Martinez,
Abstract summary: We introduce a semi-automated approach to convert structured clinical datasets into HL7 FHIR format.<n>In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2.<n>Error analysis revealed occasional hallucinations of non-existent attributes and mismatches in granularity, which more detailed prompts can mitigate.
Score: 0.2516393111664279
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For years, semantic interoperability standards have sought to streamline the exchange of clinical data, yet their deployment remains time-consuming, resource-intensive, and technically challenging. To address this, we introduce a semi-automated approach that leverages large language models specifically GPT-4o and Llama 3.2 405b to convert structured clinical datasets into HL7 FHIR format while assessing accuracy, reliability, and security. Applying our method to the MIMIC-IV database, we combined embedding techniques, clustering algorithms, and semantic retrieval to craft prompts that guide the models in mapping each tabular field to its corresponding FHIR resource. In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2 thanks to the inclusion of FHIR resource schemas within the prompt. Under real-world conditions, accuracy dipped slightly to 94 %, but refinements to the prompting strategy restored robust mappings. Error analysis revealed occasional hallucinations of non-existent attributes and mismatches in granularity, which more detailed prompts can mitigate. Overall, our study demonstrates the feasibility of context-aware, LLM-driven transformation of clinical data into HL7 FHIR, laying the groundwork for semi-automated interoperability workflows. Future work will focus on fine-tuning models with specialized medical corpora, extending support to additional standards such as HL7 CDA and OMOP, and developing an interactive interface to enable expert validation and iterative refinement.

Related papers

IM-Chat: A Multi-agent LLM-based Framework for Knowledge Transfer in Injection Molding Industry [1.3369318110920576]
This study introduces IM-Chat, a multi-agent framework based on large language models (LLMs)<n> IM-Chat integrates both limited documented knowledge (e.g., troubleshooting tables, manuals) and extensive field data modeled through a data-driven process condition generator.<n>Performance was assessed across 100 single-tool and 60 hybrid tasks for GPT-4o, GPT-4o-mini, and GPT-3.5-turboizable.
arXiv Detail & Related papers (2025-07-21T06:13:53Z)
Efficient Federated Learning with Timely Update Dissemination [54.668309196009204]
Federated Learning (FL) has emerged as a compelling methodology for the management of distributed data.<n>We propose an efficient FL approach that capitalizes on additional downlink bandwidth resources to ensure timely update dissemination.
arXiv Detail & Related papers (2025-07-08T14:34:32Z)
Interpretable AI for Time-Series: Multi-Model Heatmap Fusion with Global Attention and NLP-Generated Explanations [1.331812695405053]
We present a novel framework for enhancing model interpretability by integrating heatmaps produced by ResNet and a restructured 2D Transformer with globally weighted input saliency.<n>Our method merges gradient-weighted activation maps (ResNet) and Transformer attention rollout into a unified visualization, achieving full spatial-temporal alignment.<n> Empirical evaluations on clinical (ECG arrhythmia detection) and industrial datasets demonstrate significant improvements.
arXiv Detail & Related papers (2025-06-30T20:04:35Z)
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z)
Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs [14.618008816273784]
MINT is a general approach to determine if given data was used for training machine learning models.<n>This work focuses on its application to the domain of Natural Language Processing.
arXiv Detail & Related papers (2025-03-10T14:32:56Z)
RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration [2.879328762187361]
We present RAAD-LLM, a novel framework for adaptive anomaly detection.<n>By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data.<n>Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset.
arXiv Detail & Related papers (2025-03-04T17:20:43Z)
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset. We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z)
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations. Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z)
Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models [2.186740861187042]
Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets.<n>This paper investigates the potential of large language models (LLMs) to improve adherence to metadata standards.<n>We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository.
arXiv Detail & Related papers (2024-04-08T22:29:53Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
Large Language Models as Automated Aligners for benchmarking Vision-Language Models [48.4367174400306]
Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks. Existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets, face significant limitations in assessing the alignment of these increasingly anthropomorphic models with human intelligence. In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient curation, measuring the alignment betweenVLMs and human intelligence and value through automatic data curation and assessment.
arXiv Detail & Related papers (2023-11-24T16:12:05Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.