Zero-shot information extraction from radiological reports using ChatGPT
- URL: http://arxiv.org/abs/2309.01398v2
- Date: Thu, 7 Sep 2023 01:36:08 GMT
- Title: Zero-shot information extraction from radiological reports using ChatGPT
- Authors: Danqing Hu, Bing Liu, Xiaofeng Zhu, Xudong Lu, Nan Wu
- Abstract summary: Information extraction is the strategy to transform the sequence of characters into structured data.
With the large language models achieving good performances on various downstream NLP tasks, it becomes possible to use large language models for zero-shot information extraction.
In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract useful information from the radiological reports.
- Score: 19.457604666012767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic health records contain an enormous amount of valuable information,
but many are recorded in free text. Information extraction is the strategy to
transform the sequence of characters into structured data, which can be
employed for secondary analysis. However, the traditional information
extraction components, such as named entity recognition and relation
extraction, require annotated data to optimize the model parameters, which has
become one of the major bottlenecks in building information extraction systems.
With the large language models achieving good performances on various
downstream NLP tasks without parameter tuning, it becomes possible to use large
language models for zero-shot information extraction. In this study, we aim to
explore whether the most popular large language model, ChatGPT, can extract
useful information from the radiological reports. We first design the prompt
template for the interested information in the CT reports. Then, we generate
the prompts by combining the prompt template with the CT reports as the inputs
of ChatGPT to obtain the responses. A post-processing module is developed to
transform the responses into structured extraction results. We conducted the
experiments with 847 CT reports collected from Peking University Cancer
Hospital. The experimental results indicate that ChatGPT can achieve
competitive performances for some extraction tasks compared with the baseline
information extraction system, but some limitations need to be further
improved.
Related papers
- Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports [2.932283627137903]
The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports for isocitrate dehydrogenase (IDH) mutation status.
arXiv Detail & Related papers (2024-09-15T15:21:45Z) - Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework.
We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks.
We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z) - GPT Struct Me: Probing GPT Models on Narrative Entity Extraction [2.049592435988883]
We evaluate the capabilities of two state-of-the-art language models -- GPT-3 and GPT-3.5 -- in the extraction of narrative entities.
This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles.
arXiv Detail & Related papers (2023-11-24T16:19:04Z) - Fine-tuning and aligning question answering models for complex
information extraction tasks [0.8392546351624164]
extractive language models like question answering (QA) or passage retrieval models guarantee query results to be found within the boundaries of an according context document.
We show that fine-tuning existing German QA models boosts performance for tailored extraction tasks of complex linguistic features.
We deduce a combined metric from Levenshtein distance, F1-Score, Exact Match and ROUGE-L to mimic the assessment criteria from human experts.
arXiv Detail & Related papers (2023-09-26T10:02:21Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Extracting Accurate Materials Data from Research Papers with
Conversational Language Models and Prompt Engineering [0.0]
ChatExtract can fully automate very accurate data extraction with minimal initial effort and background.
In tests on materials data we find precision and recall both close to 90% from the best conversational LLMs.
arXiv Detail & Related papers (2023-03-07T17:54:53Z) - Towards Relation Extraction From Speech [56.36416922396724]
We propose a new listening information extraction task, i.e., speech relation extraction.
We construct the training dataset for speech relation extraction via text-to-speech systems, and we construct the testing dataset via crowd-sourcing with native English speakers.
We conduct comprehensive experiments to distinguish the challenges in speech relation extraction, which may shed light on future explorations.
arXiv Detail & Related papers (2022-10-17T05:53:49Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.