Surpassing GPT-4 Medical Coding with a Two-Stage Approach
- URL: http://arxiv.org/abs/2311.13735v1
- Date: Wed, 22 Nov 2023 23:35:13 GMT
- Title: Surpassing GPT-4 Medical Coding with a Two-Stage Approach
- Authors: Zhichao Yang, Sanjit Singh Batra, Joel Stremmel, Eran Halperin
- Abstract summary: GPT-4 LLM predicts an excessive number of ICD codes for medical coding tasks, leading to high recall but low precision.
We introduce LLM-codex, a two-stage approach to predict ICD codes that first generates evidence proposals and then employs an LSTM-based verification stage.
Our model is the only approach that simultaneously achieves state-of-the-art results in medical coding accuracy, accuracy on rare codes, and sentence-level evidence identification.
- Score: 1.7014913888753238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in large language models (LLMs) show potential for clinical
applications, such as clinical decision support and trial recommendations.
However, the GPT-4 LLM predicts an excessive number of ICD codes for medical
coding tasks, leading to high recall but low precision. To tackle this
challenge, we introduce LLM-codex, a two-stage approach to predict ICD codes
that first generates evidence proposals using an LLM and then employs an
LSTM-based verification stage. The LSTM learns from both the LLM's high recall
and human expert's high precision, using a custom loss function. Our model is
the only approach that simultaneously achieves state-of-the-art results in
medical coding accuracy, accuracy on rare codes, and sentence-level evidence
identification to support coding decisions without training on human-annotated
evidence according to experiments on the MIMIC dataset.
Related papers
- Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks.
ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.
Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications? [8.89829757177796]
We examine the effectiveness of vector representations from last hidden states of Large Language Models for medical diagnostics and prognostics.
We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluate their utilities as feature extractors.
Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results.
arXiv Detail & Related papers (2024-08-15T03:56:40Z) - XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare [16.79952669254101]
We develop a novel method for zero-shot/few-shot in-context learning (ICL) using a multi-layered structured prompt.
We also explore the efficacy of two communication styles between the user and Large Language Models (LLMs)
Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates.
arXiv Detail & Related papers (2024-05-10T06:52:44Z) - Self-Supervised Multiple Instance Learning for Acute Myeloid Leukemia Classification [1.1874560263468232]
Diseases like Acute Myeloid Leukemia (AML) pose challenges due to scarce and costly annotations on a single-cell level.
Multiple Instance Learning (MIL) addresses weakly labeled scenarios but necessitates powerful encoders typically trained with labeled data.
In this study, we explore Self-Supervised Learning (SSL) as a pre-training approach for MIL-based subtype AML classification from blood smears.
arXiv Detail & Related papers (2024-03-08T15:16:15Z) - Combining Insights From Multiple Large Language Models Improves
Diagnostic Accuracy [0.0]
Large language models (LLMs) are proposed as viable diagnostic support tools or even spoken of as replacements for "curbside consults"
We assessed and compared the accuracy of differential diagnoses obtained by asking individual commercial LLMs against the accuracy of differential diagnoses synthesized by aggregating responses from combinations of the same LLMs.
arXiv Detail & Related papers (2024-02-13T21:24:21Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Large Language Models in Medical Term Classification and Unexpected
Misalignment Between Response and Reasoning [28.355000184014084]
This study assesses the ability of state-of-the-art large language models (LLMs) to identify patients with mild cognitive impairment (MCI) from discharge summaries.
The data was partitioned into training, validation, and testing sets in a 7:2:1 ratio for model fine-tuning and evaluation.
Open-source models like Falcon and LLaMA 2 achieved high accuracy but lacked explanatory reasoning.
arXiv Detail & Related papers (2023-12-19T17:36:48Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.