BI-RADS BERT & Using Section Tokenization to Understand Radiology
Reports
- URL: http://arxiv.org/abs/2110.07552v1
- Date: Thu, 14 Oct 2021 17:25:49 GMT
- Title: BI-RADS BERT & Using Section Tokenization to Understand Radiology
Reports
- Authors: Grey Kuling, Dr. Belinda Curpen, and Anne L. Martel
- Abstract summary: Domain specific contextual word embeddings have been shown to achieve impressive accuracy at such natural language processing tasks in medicine.
BERT model pre-trained on breast radiology reports combined with section tokenization resulted in an overall accuracy of 95.9% in field extraction.
- Score: 0.18352113484137625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Radiology reports are the main form of communication between radiologists and
other clinicians, and contain important information for patient care. However
in order to use this information for research it is necessary to convert the
raw text into structured data suitable for analysis. Domain specific contextual
word embeddings have been shown to achieve impressive accuracy at such natural
language processing tasks in medicine. In this work we pre-trained a contextual
embedding BERT model using breast radiology reports and developed a classifier
that incorporated the embedding with auxiliary global textual features in order
to perform a section tokenization task. This model achieved a 98% accuracy at
segregating free text reports into sections of information outlined in the
Breast Imaging Reporting and Data System (BI-RADS) lexicon, a significant
improvement over the Classic BERT model without auxiliary information. We then
evaluated whether using section tokenization improved the downstream extraction
of the following fields: modality/procedure, previous cancer, menopausal
status, purpose of exam, breast density and background parenchymal enhancement.
Using the BERT model pre-trained on breast radiology reports combined with
section tokenization resulted in an overall accuracy of 95.9% in field
extraction. This is a 17% improvement compared to an overall accuracy of 78.9%
for field extraction for models without section tokenization and with Classic
BERT embeddings. Our work shows the strength of using BERT in radiology report
analysis and the advantages of section tokenization in identifying key features
of patient factors recorded in breast radiology reports.
Related papers
- RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization [1.8450534779202723]
This study proposes RadBARTsum, a domain-specific and facilitated adaptation of the BART model for abstractive radiology report summarization.
The approach involves two main steps: 1) re-training the BART model on a large corpus of radiology reports using a novel entity masking strategy to improve biomedical domain knowledge learning, and 2) fine-tuning the model for the summarization task using the Findings and Background sections to predict the Impression section.
arXiv Detail & Related papers (2024-06-05T08:43:11Z) - Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation [10.46031380503486]
We introduce a novel method, textbfStructural textbfEntities extraction and patient indications textbfIncorporation (SEI) for chest X-ray report generation.
We employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports.
We propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications.
arXiv Detail & Related papers (2024-05-23T01:29:47Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Detecting automatically the layout of clinical documents to enhance the
performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text.
The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers.
Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Knowledge Graph Construction and Its Application in Automatic Radiology
Report Generation from Radiologist's Dictation [22.894248859405767]
This paper focuses on applications of NLP techniques like Information Extraction (IE) and domain-specific Knowledge Graph (KG) to automatically generate radiology reports from radiologist's dictation.
We develop an information extraction pipeline that combines rule-based, pattern-based, and dictionary-based techniques with lexical-semantic features to extract entities and relations.
We generate pathological descriptions evaluated using semantic similarity metrics, which shows 97% similarity with gold standard pathological descriptions.
arXiv Detail & Related papers (2022-06-13T16:46:54Z) - Supervised Machine Learning Algorithm for Detecting Consistency between
Reported Findings and the Conclusions of Mammography Reports [66.89977257992568]
Mammography reports document the diagnosis of patients' conditions.
Many reports contain non-standard terms (non-BI-RADS descriptors) and incomplete statements.
Our aim was to develop a tool to detect such discrepancies by comparing the reported conclusions to those that would be expected based on the reported radiology findings.
arXiv Detail & Related papers (2022-02-28T08:59:04Z) - Radiology Report Generation with a Learned Knowledge Base and
Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray.
Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment.
With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z) - Event-based clinical findings extraction from radiology reports with
pre-trained language model [0.22940141855172028]
We present a new corpus of radiology reports annotated with clinical findings.
The gold standard corpus contained a total of 500 annotated computed tomography (CT) reports.
We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT.
arXiv Detail & Related papers (2021-12-27T05:03:10Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.