Evaluating ChatGPT text-mining of clinical records for obesity
monitoring
- URL: http://arxiv.org/abs/2308.01666v1
- Date: Thu, 3 Aug 2023 10:11:42 GMT
- Title: Evaluating ChatGPT text-mining of clinical records for obesity
monitoring
- Authors: Ivo S. Fins (1), Heather Davies (1), Sean Farrell (2), Jose R.Torres
(3), Gina Pinchbeck (1), Alan D. Radford (1), Peter-John Noble (1) ((1) Small
Animal Veterinary Surveillance Network, Institute of Infection, Veterinary
and Ecological Sciences, University of Liverpool, Liverpool, UK, (2)
Department of Computer Science, Durham University, Durham, UK, (3) Institute
for Animal Health and Food Safety, University of Las Palmas de Gran Canaria,
Las Palmas, Canary Archipelago, Spain)
- Abstract summary: We compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight anonymised body condition scores (BCS) in veterinary narratives.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Veterinary clinical narratives remain a largely untapped resource
for addressing complex diseases. Here we compare the ability of a large
language model (ChatGPT) and a previously developed regular expression (RegexT)
to identify overweight body condition scores (BCS) in veterinary narratives.
Methods: BCS values were extracted from 4,415 anonymised clinical narratives
using either RegexT or by appending the narrative to a prompt sent to ChatGPT
coercing the model to return the BCS information. Data were manually reviewed
for comparison. Results: The precision of RegexT was higher (100%, 95% CI
94.81-100%) than the ChatGPT (89.3%; 95% CI82.75-93.64%). However, the recall
of ChatGPT (100%. 95% CI 96.18-100%) was considerably higher than that of
RegexT (72.6%, 95% CI 63.92-79.94%). Limitations: Subtle prompt engineering is
needed to improve ChatGPT output. Conclusions: Large language models create
diverse opportunities and, whilst complex, present an intuitive interface to
information but require careful implementation to avoid unpredictable errors.
Related papers
- Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Fact-Checking Generative AI: Ontology-Driven Biological Graphs for Disease-Gene Link Verification [45.65374554914359]
We aim to achieve fact-checking of the knowledge embedded in biological graphs that were contrived from ChatGPT contents.
We adopted a biological networks approach that enables the systematic interrogation of ChatGPT's linked entities.
This study demonstrated high accuracy of aggregate disease-gene links relationships found in ChatGPT-generated texts.
arXiv Detail & Related papers (2023-08-07T22:13:30Z) - Identifying and Extracting Rare Disease Phenotypes with Large Language
Models [12.555067118549347]
ChatGPT is a revolutionary large language model capable of following complex human prompts and generating high-quality responses.
We compared its performance to the traditional fine-tuning approach and conducted an in-depth error analysis.
ChatGPT achieved similar or higher accuracy for certain entities (i.e., rare diseases and signs) in the one-shot setting.
arXiv Detail & Related papers (2023-06-22T03:52:12Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - Translating Radiology Reports into Plain Language using ChatGPT and
GPT-4 with Prompt Learning: Promising Results, Limitations, and Potential [6.127537348178505]
ChatGPT can successfully translate radiology reports into plain language with an average score of 4.27 in the five-point system.
ChatGPT presents some randomness in its responses with occasionally over-simplified or neglected information.
Results are compared with a newly released large model GPT-4, showing that GPT-4 can significantly improve the quality of reports.
arXiv Detail & Related papers (2023-03-16T02:21:39Z) - ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction
Benchmark [11.36853733574956]
ChatGPT is a cutting-edge artificial intelligence language model developed by OpenAI.
We compare it with commercial GEC product (e.g., Grammarly) and state-of-the-art models (e.g., GECToR)
We find that ChatGPT performs not as well as those baselines in terms of the automatic evaluation metrics.
arXiv Detail & Related papers (2023-03-15T00:35:50Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - BI-RADS BERT & Using Section Tokenization to Understand Radiology
Reports [0.18352113484137625]
Domain specific contextual word embeddings have been shown to achieve impressive accuracy at such natural language processing tasks in medicine.
BERT model pre-trained on breast radiology reports combined with section tokenization resulted in an overall accuracy of 95.9% in field extraction.
arXiv Detail & Related papers (2021-10-14T17:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.