Related papers: A medical coding language model trained on clinical narratives from a population-wide cohort of 1.8 million patients

A medical coding language model trained on clinical narratives from a population-wide cohort of 1.8 million patients

URL: http://arxiv.org/abs/2603.00221v2
Date: Tue, 03 Mar 2026 16:48:38 GMT
Title: A medical coding language model trained on clinical narratives from a population-wide cohort of 1.8 million patients
Authors: Joakim Edin, Sedrah Butt Balaganeshan, Annike Kjølby Kristensen, Lars Maaløe, Ioannis Louloudis, Søren Brunak,
Abstract summary: Existing automation efforts rely on small datasets that poorly represent real-world patient heterogeneity.<n>We trained a language model on 5.8 million electronic health records from 1.8 million patients across nearly all specialties in Eastern Denmark (2006--2016) to predict ICD-10 codes.
Score: 2.2666010316046075
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Medical coding translates clinical documentation into standardized codes for billing, research, and public health, but manual coding is time-consuming and error-prone. Existing automation efforts rely on small datasets that poorly represent real-world patient heterogeneity. We trained a language model on 5.8 million electronic health records from 1.8 million patients across nearly all specialties in Eastern Denmark (2006--2016) to predict ICD-10 codes from clinical notes, medications, and laboratory results. Evaluated on 270,000 held-out patients, the model achieved a micro F1 of 71.8% and a top-10 recall of 95.5%. Performance varied by specialty (F1: 53--91%), with higher scores in specialties with well-defined diagnostic criteria. Codes appearing predominantly as secondary diagnoses had markedly lower F1 scores. For three such codes (suicide-related behaviors, weight disorders, and hypertension), the model identified thousands of uncoded cases, of which 76-86% were confirmed valid upon manual review, suggesting systematic under-coding rather than model error. These findings suggest under-coding of secondary diagnoses in Eastern Denmark during this period, with potential implications for epidemiological research, public health surveillance, and understanding of multimorbidity. Similar time constraints and reimbursement structures in other healthcare systems suggest this may not be isolated to this dataset. The model can automate coding for approximately 50% of cases and provide accurate suggestions for most others, and may offer a practical solution to help capture missed secondary conditions.

Related papers

An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z)
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [69.46279475491164]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z)
RuCCoD: Towards Automated ICD Coding in Russian [42.609069328685045]
This study investigates the feasibility of automating clinical coding in Russian, a language with limited biomedical resources.<n>We present a new dataset for ICD coding, which includes diagnosis fields from electronic health records annotated with over 10,000 entities and more than 1,500 unique ICD codes.
arXiv Detail & Related papers (2025-02-28T17:40:24Z)
Collaborative residual learners for automatic icd10 prediction using prescribed medications [45.82374977939355]
We propose a novel collaborative residual learning based model to automatically predict ICD10 codes employing only prescriptions data. We obtain multi-label classification accuracy of 0.71 and 0.57 of average precision, 0.57 and 0.38 of F1-score and 0.73 and 0.44 of accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:07:27Z)
Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions. We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z)
Classification supporting COVID-19 diagnostics based on patient survey data [82.41449972618423]
logistic regression and XGBoost classifiers, that allow for effective screening of patients for COVID-19 were generated. The obtained classification models provided the basis for the DECODE service (decode.polsl.pl), which can serve as support in screening patients with COVID-19 disease. This data set consists of more than 3,000 examples is based on questionnaires collected at a hospital in Poland.
arXiv Detail & Related papers (2020-11-24T17:44:01Z)
Predicting Clinical Diagnosis from Patients Electronic Health Records Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community. We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence. We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
Multi-label natural language processing to identify diagnosis and procedure codes from MIMIC-III inpatient notes [0.0]
In the United States, 25% or greater than 200 billion dollars of hospital spending accounts for administrative costs that involve medical coding and billing. Natural language processing can automate the extraction of codes/labels from unstructured clinical notes. Our model achieved an overall accuracy of 87.08%, an F1 score of 85.82%, and an AUC of 91.76% for top-10 codes.
arXiv Detail & Related papers (2020-03-17T02:56:27Z)
Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks [0.0]
We report the performance of a natural language processing model that can map clinical notes to medical codes. We employed state-of-the-art deep learning method, ULMFiT on the largest emergency department clinical notes dataset MIMIC III. Our models were able to predict the top-10 diagnoses and procedures with 80.3% and 80.5% accuracy, whereas the top-50 ICD-9 codes of diagnosis and procedures are predicted with 70.7% and 63.9% accuracy.
arXiv Detail & Related papers (2019-12-28T04:05:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.