Enhancing chest X-ray datasets with privacy-preserving large language
models and multi-type annotations: a data-driven approach for improved
classification
- URL: http://arxiv.org/abs/2403.04024v1
- Date: Wed, 6 Mar 2024 20:10:41 GMT
- Title: Enhancing chest X-ray datasets with privacy-preserving large language
models and multi-type annotations: a data-driven approach for improved
classification
- Authors: Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald Summers
- Abstract summary: In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports.
We present MAPLEZ, a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels on CXR reports.
- Score: 0.6906005491572398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In chest X-ray (CXR) image analysis, rule-based systems are usually employed
to extract labels from reports, but concerns exist about label quality. These
datasets typically offer only presence labels, sometimes with binary
uncertainty indicators, which limits their usefulness. In this work, we present
MAPLEZ (Medical report Annotations with Privacy-preserving Large language model
using Expeditious Zero shot answers), a novel approach leveraging a locally
executable Large Language Model (LLM) to extract and enhance findings labels on
CXR reports. MAPLEZ extracts not only binary labels indicating the presence or
absence of a finding but also the location, severity, and radiologists'
uncertainty about the finding. Over eight abnormalities from five test sets, we
show that our method can extract these annotations with an increase of 5
percentage points (pp) in F1 score for categorical presence annotations and
more than 30 pp increase in F1 score for the location annotations over
competing labelers. Additionally, using these improved annotations in
classification supervision, we demonstrate substantial advancements in model
quality, with an increase of 1.7 pp in AUROC over models trained with
annotations from the state-of-the-art approach. We share code and annotations.
Related papers
- Substituting Data Annotation with Balanced Updates and Collective Loss
in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text.
We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels.
Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z) - Automated Labeling of German Chest X-Ray Radiology Reports using Deep
Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model.
Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z) - German CheXpert Chest X-ray Radiology Report Labeler [50.591267188664666]
This study aimed to develop an algorithm to automatically extract annotations for chest X-ray classification models from German thoracic radiology reports.
Results showed that automated label extraction can reduce time spent on manual labeling and improve overall modeling performance.
arXiv Detail & Related papers (2023-06-05T11:01:58Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels.
We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z) - Probabilistic Integration of Object Level Annotations in Chest X-ray
Classification [37.99281019411076]
We propose a new probabilistic latent variable model for disease classification in chest X-ray images.
Global dataset features are learned in the lower level layers of the model.
Specific details and nuances in the fine-grained expert object-level annotations are learned in the final layers.
arXiv Detail & Related papers (2022-10-13T12:53:42Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Rethinking Pseudo Labels for Semi-Supervised Object Detection [84.697097472401]
We introduce certainty-aware pseudo labels tailored for object detection.
We dynamically adjust the thresholds used to generate pseudo labels and reweight loss functions for each category to alleviate the class imbalance problem.
Our approach improves supervised baselines by up to 10% AP using only 1-10% labeled data from COCO.
arXiv Detail & Related papers (2021-06-01T01:32:03Z) - Learning Image Labels On-the-fly for Training Robust Classification
Models [13.669654965671604]
We show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks.
A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes.
arXiv Detail & Related papers (2020-09-22T05:38:44Z) - CheXbert: Combining Automatic Labelers and Expert Annotations for
Accurate Radiology Report Labeling Using BERT [6.458158112222296]
We introduce a BERT-based approach to medical image report labeling.
We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler.
We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance.
arXiv Detail & Related papers (2020-04-20T09:46:40Z) - Fine-Grained Named Entity Typing over Distantly Supervised Data Based on
Refined Representations [16.30478830298353]
Fine-Grained Named Entity Typing (FG-NET) is a key component in Natural Language Processing (NLP)
We propose an edge-weighted attentive graph convolution network that refines the noisy mention representations by attending over corpus-level contextual clues prior to the end classification.
Experimental evaluation shows that the proposed model outperforms the existing research by a relative score of upto 10.2% and 8.3% for macro f1 and micro f1 respectively.
arXiv Detail & Related papers (2020-04-07T17:26:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.