An evaluation of GPT models for phenotype concept recognition
- URL: http://arxiv.org/abs/2309.17169v2
- Date: Thu, 23 Nov 2023 02:06:20 GMT
- Title: An evaluation of GPT models for phenotype concept recognition
- Authors: Tudor Groza, Harry Caufield, Dylan Gration, Gareth Baynam, Melissa A
Haendel, Peter N Robinson, Christopher J Mungall and Justin T Reese
- Abstract summary: We examine the performance of the latest Generative Pre-trained Transformer (GPT) models for clinical phenotyping and phenotype annotation.
Our results show that, with an appropriate setup, these models can achieve state of the art performance.
- Score: 0.4715973318447338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objective: Clinical deep phenotyping and phenotype annotation play a critical
role in both the diagnosis of patients with rare disorders as well as in
building computationally-tractable knowledge in the rare disorders field. These
processes rely on using ontology concepts, often from the Human Phenotype
Ontology, in conjunction with a phenotype concept recognition task (supported
usually by machine learning methods) to curate patient profiles or existing
scientific literature. With the significant shift in the use of large language
models (LLMs) for most NLP tasks, we examine the performance of the latest
Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a
foundation for the tasks of clinical phenotyping and phenotype annotation.
Materials and Methods: The experimental setup of the study included seven
prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and
gpt-4.0) and two established gold standard corpora for phenotype recognition,
one consisting of publication abstracts and the other clinical observations.
Results: Our results show that, with an appropriate setup, these models can
achieve state of the art performance. The best run, using few-shot learning,
achieved 0.58 macro F1 score on publication abstracts and 0.75 macro F1 score
on clinical observations, the former being comparable with the state of the
art, while the latter surpassing the current best in class tool. Conclusion:
While the results are promising, the non-deterministic nature of the outcomes,
the high cost and the lack of concordance between different runs using the same
prompt and input make the use of these LLMs challenging for this particular
task.
Related papers
- High-Throughput Phenotyping of Clinical Text Using Large Language Models [0.0]
GPT-4 surpasses GPT-3.5-Turbo in identifying, categorizing, and normalizing signs.
GPT-4 results in high performance and generalizability across several phenotyping tasks.
arXiv Detail & Related papers (2024-08-02T12:00:00Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary
Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information.
A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction.
The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z) - Enhancing Phenotype Recognition in Clinical Notes Using Large Language
Models: PhenoBCBERT and PhenoGPT [11.20254354103518]
We developed two types of models: PhenoBCBERT, a BERT-based model, and PhenoGPT, a GPT-based model.
We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO.
arXiv Detail & Related papers (2023-08-11T03:40:22Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - PheME: A deep ensemble framework for improving phenotype prediction from
multi-modal data [42.56953523499849]
We present PheME, an Ensemble framework using Multi-modality data of structured EHRs and unstructured clinical notes for accurate Phenotype prediction.
We leverage ensemble learning to combine outputs from single-modal models and multi-modal models to improve phenotype predictions.
arXiv Detail & Related papers (2023-03-19T23:41:04Z) - Benchmarking Heterogeneous Treatment Effect Models through the Lens of
Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem.
Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools.
We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z) - Factored Attention and Embedding for Unstructured-view Topic-related
Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation.
The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z) - Ensembling Handcrafted Features with Deep Features: An Analytical Study
for Classification of Routine Colon Cancer Histopathological Nuclei Images [13.858624044986815]
We have used F1-measure, Precision, Recall, AUC, and Cross-Entropy Loss to analyse the performance of our approaches.
We observed from the results that the DL features ensemble bring a marked improvement in the overall performance of the model.
arXiv Detail & Related papers (2022-02-22T06:48:50Z) - Hybrid deep learning methods for phenotype prediction from clinical
notes [4.866431869728018]
This paper proposes a novel hybrid model for automatically extracting patient phenotypes using natural language processing and deep learning models.
The proposed hybrid model is based on a neural bidirectional sequence model (BiLSTM or BiGRU) and a Convolutional Neural Network (CNN) for identifying patient's phenotypes in discharge reports.
arXiv Detail & Related papers (2021-08-16T05:57:28Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.