High Throughput Phenotyping of Physician Notes with Large Language and
Hybrid NLP Models
- URL: http://arxiv.org/abs/2403.05920v1
- Date: Sat, 9 Mar 2024 14:02:59 GMT
- Title: High Throughput Phenotyping of Physician Notes with Large Language and
Hybrid NLP Models
- Authors: Syed I. Munzir, Daniel B. Hier, Michael D. Carrithers
- Abstract summary: Deep phenotyping is the detailed description of patient signs and symptoms using concepts from an ontology.
In this study, we demonstrate that a large language model and a hybrid NLP model can perform high throughput phenotyping on physician notes with high accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep phenotyping is the detailed description of patient signs and symptoms
using concepts from an ontology. The deep phenotyping of the numerous physician
notes in electronic health records requires high throughput methods. Over the
past thirty years, progress toward making high throughput phenotyping feasible.
In this study, we demonstrate that a large language model and a hybrid NLP
model (combining word vectors with a machine learning classifier) can perform
high throughput phenotyping on physician notes with high accuracy. Large
language models will likely emerge as the preferred method for high throughput
deep phenotyping of physician notes.
Related papers
- Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - High-Throughput Phenotyping of Clinical Text Using Large Language Models [0.0]
GPT-4 surpasses GPT-3.5-Turbo in identifying, categorizing, and normalizing signs.
GPT-4 results in high performance and generalizability across several phenotyping tasks.
arXiv Detail & Related papers (2024-08-02T12:00:00Z) - A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes [0.0]
This study compares three computational approaches to high- throughput phenotyping.
A Large Language Model (LLM) incorporating generative AI, a Natural Language Processing (NLP) approach utilizing deep learning for span categorization, and a hybrid approach combining word vectors with machine learning.
The approach that implemented GPT-4 (a Large Language Model) demonstrated superior performance.
arXiv Detail & Related papers (2024-06-20T22:05:34Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Enhancing Phenotype Recognition in Clinical Notes Using Large Language
Models: PhenoBCBERT and PhenoGPT [11.20254354103518]
We developed two types of models: PhenoBCBERT, a BERT-based model, and PhenoGPT, a GPT-based model.
We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO.
arXiv Detail & Related papers (2023-08-11T03:40:22Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Textual Data Augmentation for Patient Outcomes Prediction [67.72545656557858]
We propose a novel data augmentation method to generate artificial clinical notes in patients' Electronic Health Records.
We fine-tune the generative language model GPT-2 to synthesize labeled text with the original training data.
We evaluate our method on the most common patient outcome, i.e., the 30-day readmission rate.
arXiv Detail & Related papers (2022-11-13T01:07:23Z) - Fine-Tuning Large Neural Language Models for Biomedical Natural Language
Processing [55.52858954615655]
We conduct a systematic study on fine-tuning stability in biomedical NLP.
We show that finetuning performance may be sensitive to pretraining settings, especially in low-resource domains.
We show that these techniques can substantially improve fine-tuning performance for lowresource biomedical NLP applications.
arXiv Detail & Related papers (2021-12-15T04:20:35Z) - Hybrid deep learning methods for phenotype prediction from clinical
notes [4.866431869728018]
This paper proposes a novel hybrid model for automatically extracting patient phenotypes using natural language processing and deep learning models.
The proposed hybrid model is based on a neural bidirectional sequence model (BiLSTM or BiGRU) and a Convolutional Neural Network (CNN) for identifying patient's phenotypes in discharge reports.
arXiv Detail & Related papers (2021-08-16T05:57:28Z) - Neural Language Models with Distant Supervision to Identify Major
Depressive Disorder from Clinical Notes [2.1060613825447407]
Major depressive disorder (MDD) is a prevalent psychiatric disorder that is associated with significant healthcare burden worldwide.
Recent advancements in neural language models, such as Bidirectional Representations for Transformers (BERT) model, resulted in state-of-the-art neural language models.
We propose to leverage the neural language models in a distant supervision paradigm to identify MDD phenotypes from clinical notes.
arXiv Detail & Related papers (2021-04-19T21:11:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.