CheXbert: Combining Automatic Labelers and Expert Annotations for
Accurate Radiology Report Labeling Using BERT
- URL: http://arxiv.org/abs/2004.09167v3
- Date: Sun, 18 Oct 2020 20:30:22 GMT
- Title: CheXbert: Combining Automatic Labelers and Expert Annotations for
Accurate Radiology Report Labeling Using BERT
- Authors: Akshay Smit, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng,
Matthew P. Lungren
- Abstract summary: We introduce a BERT-based approach to medical image report labeling.
We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler.
We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance.
- Score: 6.458158112222296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The extraction of labels from radiology text reports enables large-scale
training of medical imaging models. Existing approaches to report labeling
typically rely either on sophisticated feature engineering based on medical
domain knowledge or manual annotations by experts. In this work, we introduce a
BERT-based approach to medical image report labeling that exploits both the
scale of available rule-based systems and the quality of expert annotations. We
demonstrate superior performance of a biomedically pretrained BERT model first
trained on annotations of a rule-based labeler and then finetuned on a small
set of expert annotations augmented with automated backtranslation. We find
that our final model, CheXbert, is able to outperform the previous best
rules-based labeler with statistical significance, setting a new SOTA for
report labeling on one of the largest datasets of chest x-rays.
Related papers
- Automated Spinal MRI Labelling from Reports Using a Large Language Model [45.348320669329205]
We propose a pipeline to automate the extraction of labels from radiology reports using large language models.
Our method equals or surpasses GPT-4 on a held-out set of reports.
We show that the extracted labels can be used to train imaging models to classify the identified conditions in the accompanying MR scans.
arXiv Detail & Related papers (2024-10-22T17:54:07Z) - Enhancing chest X-ray datasets with privacy-preserving large language models and multi-type annotations: a data-driven approach for improved classification [0.6144680854063935]
In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports for dataset releases.
We present MAPLEZ, a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels.
arXiv Detail & Related papers (2024-03-06T20:10:41Z) - CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling [6.813646734420541]
Free-text radiology reports present a rich data source for various medical tasks, but effectively labeling these texts remains challenging.
Our study offers three main contributions: 1) We demonstrate the potential of GPT as an adept labeler using carefully designed prompts; 2) We trained a BERT-based labeler, CheX-GPT, which operates faster and more efficiently than its GPT counterpart; and 3) To benchmark labeler performance, we introduced a publicly available expert-annotated test set, MIMIC-500.
arXiv Detail & Related papers (2024-01-21T14:30:20Z) - Automated Labeling of German Chest X-Ray Radiology Reports using Deep
Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model.
Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z) - German CheXpert Chest X-ray Radiology Report Labeler [50.591267188664666]
This study aimed to develop an algorithm to automatically extract annotations for chest X-ray classification models from German thoracic radiology reports.
Results showed that automated label extraction can reduce time spent on manual labeling and improve overall modeling performance.
arXiv Detail & Related papers (2023-06-05T11:01:58Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Developing A Visual-Interactive Interface for Electronic Health Record
Labeling: An Explainable Machine Learning Approach [0.0]
We introduce Explainable Labeling Assistant (XLabel) a new visual-interactive tool for data labeling.
XLabel uses Explainable Boosting Machine (EBM) to classify the labels of each data point and visualizes heatmaps of EBM's explanations.
Our experiments show that 1) XLabel helps reduce the number of labeling actions, 2) EBM as an explainable classifier is as accurate as other well-known machine learning models, and 3) even when more than 40% of the records were intentionally mislabeled, EBM could recall the correct labels of more than 90% of these records.
arXiv Detail & Related papers (2022-09-26T15:40:13Z) - Fine-Tuning BERT for Automatic ADME Semantic Labeling in FDA Drug
Labeling to Enhance Product-Specific Guidance Assessment [7.776014050139462]
Product-specific guidances (PSGs) recommended by the United States Food and Drug Administration (FDA) are instrumental to promote and guide generic drug product development.
To assess a PSG, the FDA assessor needs to take extensive time and effort to manually retrieve supportive drug information of absorption, distribution, metabolism, and excretion (ADME) from the reference listed drug labeling.
We developed a novel application of ADME semantic labeling, which can automatically retrieve ADME paragraphs from drug labeling instead of manual work.
arXiv Detail & Related papers (2022-07-25T17:43:36Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.