The Radiation Oncology NLP Database
- URL: http://arxiv.org/abs/2401.10995v1
- Date: Fri, 19 Jan 2024 19:23:37 GMT
- Title: The Radiation Oncology NLP Database
- Authors: Zhengliang Liu, Jason Holmes, Wenxiong Liao, Chenbin Liu, Lian Zhang,
Hongying Feng, Peilong Wang, Muhammad Ali Elahi, Hongmin Cai, Lichao Sun,
Quanzheng Li, Xiang Li, Tianming Liu, Jiajian Shen, Wei Liu
- Abstract summary: We present the Radiation Oncology NLP Database (ROND), the first dedicated Natural Language Processing (NLP) dataset for radiation oncology.
ROND is specifically designed to address this gap in the domain of radiation oncology.
It encompasses various NLP tasks including Logic Reasoning, Text Classification, Named Entity Recognition (NER), Question Answering (QA), Text Summarization, and Patient-Clinician Conversations.
- Score: 33.391114383354804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the Radiation Oncology NLP Database (ROND), the first dedicated
Natural Language Processing (NLP) dataset for radiation oncology, an important
medical specialty that has received limited attention from the NLP community in
the past. With the advent of Artificial General Intelligence (AGI), there is an
increasing need for specialized datasets and benchmarks to facilitate research
and development. ROND is specifically designed to address this gap in the
domain of radiation oncology, a field that offers many opportunities for NLP
exploration. It encompasses various NLP tasks including Logic Reasoning, Text
Classification, Named Entity Recognition (NER), Question Answering (QA), Text
Summarization, and Patient-Clinician Conversations, each with a distinct focus
on radiation oncology concepts and application cases. In addition, we have
developed an instruction-tuning dataset consisting of over 20k instruction
pairs (based on ROND) and trained a large language model, CancerChat. This
serves to demonstrate the potential of instruction-tuning large language models
within a highly-specialized medical domain. The evaluation results in this
study could serve as baseline results for future research. ROND aims to
stimulate advancements in radiation oncology and clinical NLP by offering a
platform for testing and improving algorithms and models in a domain-specific
context. The ROND dataset is a joint effort of multiple U.S. health
institutions. The data is available at
https://github.com/zl-liu/Radiation-Oncology-NLP-Database.
Related papers
- Diagnostic Reasoning in Natural Language: Computational Model and Application [68.47402386668846]
We investigate diagnostic abductive reasoning (DAR) in the context of language-grounded tasks (NL-DAR)
We propose a novel modeling framework for NL-DAR based on Pearl's structural causal models.
We use the resulting dataset to investigate the human decision-making process in NL-DAR.
arXiv Detail & Related papers (2024-09-09T06:55:37Z) - An Introduction to Natural Language Processing Techniques and Framework
for Clinical Implementation in Radiation Oncology [1.2714439146420664]
We present state-of-the-art NLP applications that employ large language models (LLMs) in radiation oncology research.
LLMs are prone to many errors such as hallucinations, biases, and ethical violations, which necessitate rigorous evaluation and validation.
Our article aims to provide guidance and insights for researchers and clinicians who are interested in developing and using NLP models in clinical radiation oncology.
arXiv Detail & Related papers (2023-11-03T19:32:35Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - RadOnc-GPT: A Large Language Model for Radiation Oncology [42.92077650252404]
RadOnc-GPT was finetuned on a large dataset of radiation oncology patient records from the Mayo Clinic in Arizona.
The model employs instruction tuning on three key tasks - generating radiotherapy treatment regimens, determining optimal radiation modalities, and providing diagnostic descriptions/ICD codes.
arXiv Detail & Related papers (2023-09-18T21:15:02Z) - Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning.
Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z) - Radiology-GPT: A Large Language Model for Radiology [74.07944784968372]
We introduce Radiology-GPT, a large language model for radiology.
It demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA.
It exhibits significant versatility in radiological diagnosis, research, and communication.
arXiv Detail & Related papers (2023-06-14T17:57:24Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Exploring and Distilling Posterior and Prior Knowledge for Radiology
Report Generation [55.00308939833555]
The PPKED includes three modules: Posterior Knowledge Explorer (PoKE), Prior Knowledge Explorer (PrKE) and Multi-domain Knowledge Distiller (MKD)
PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias.
PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias.
arXiv Detail & Related papers (2021-06-13T11:10:02Z) - A Systematic Review of Natural Language Processing Applied to Radiology
Reports [3.600747505433814]
This study systematically assesses recent literature in NLP applied to radiology reports.
Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.
arXiv Detail & Related papers (2021-02-18T18:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.