Enhancing chest X-ray datasets with privacy-preserving large language
models and multi-type annotations: a data-driven approach for improved
classification
- URL: http://arxiv.org/abs/2403.04024v1
- Date: Wed, 6 Mar 2024 20:10:41 GMT
- Title: Enhancing chest X-ray datasets with privacy-preserving large language
models and multi-type annotations: a data-driven approach for improved
classification
- Authors: Ricardo Bigolin Lanfredi, Pritam Mukherjee, Ronald Summers
- Abstract summary: In chest X-ray (CXR) image analysis, rule-based systems are usually employed to extract labels from reports.
We present MAPLEZ, a novel approach leveraging a locally executable Large Language Model (LLM) to extract and enhance findings labels on CXR reports.
- Score: 0.6906005491572398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In chest X-ray (CXR) image analysis, rule-based systems are usually employed
to extract labels from reports, but concerns exist about label quality. These
datasets typically offer only presence labels, sometimes with binary
uncertainty indicators, which limits their usefulness. In this work, we present
MAPLEZ (Medical report Annotations with Privacy-preserving Large language model
using Expeditious Zero shot answers), a novel approach leveraging a locally
executable Large Language Model (LLM) to extract and enhance findings labels on
CXR reports. MAPLEZ extracts not only binary labels indicating the presence or
absence of a finding but also the location, severity, and radiologists'
uncertainty about the finding. Over eight abnormalities from five test sets, we
show that our method can extract these annotations with an increase of 5
percentage points (pp) in F1 score for categorical presence annotations and
more than 30 pp increase in F1 score for the location annotations over
competing labelers. Additionally, using these improved annotations in
classification supervision, we demonstrate substantial advancements in model
quality, with an increase of 1.7 pp in AUROC over models trained with
annotations from the state-of-the-art approach. We share code and annotations.
Related papers
- Learning label-label correlations in Extreme Multi-label Classification via Label Features [44.00852282861121]
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices.
Short-text XMC with label features has found numerous applications in areas such as query-to-ad-phrase matching in search ads, title-based product recommendation, prediction of related searches.
We propose Gandalf, a novel approach which makes use of a label co-occurrence graph to leverage label features as additional data points to supplement the training distribution.
arXiv Detail & Related papers (2024-05-03T21:18:43Z) - Substituting Data Annotation with Balanced Updates and Collective Loss
in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text.
We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels.
Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z) - Automated Labeling of German Chest X-Ray Radiology Reports using Deep
Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model.
Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z) - German CheXpert Chest X-ray Radiology Report Labeler [50.591267188664666]
This study aimed to develop an algorithm to automatically extract annotations for chest X-ray classification models from German thoracic radiology reports.
Results showed that automated label extraction can reduce time spent on manual labeling and improve overall modeling performance.
arXiv Detail & Related papers (2023-06-05T11:01:58Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Probabilistic Integration of Object Level Annotations in Chest X-ray
Classification [37.99281019411076]
We propose a new probabilistic latent variable model for disease classification in chest X-ray images.
Global dataset features are learned in the lower level layers of the model.
Specific details and nuances in the fine-grained expert object-level annotations are learned in the final layers.
arXiv Detail & Related papers (2022-10-13T12:53:42Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Rethinking Pseudo Labels for Semi-Supervised Object Detection [84.697097472401]
We introduce certainty-aware pseudo labels tailored for object detection.
We dynamically adjust the thresholds used to generate pseudo labels and reweight loss functions for each category to alleviate the class imbalance problem.
Our approach improves supervised baselines by up to 10% AP using only 1-10% labeled data from COCO.
arXiv Detail & Related papers (2021-06-01T01:32:03Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Learning Image Labels On-the-fly for Training Robust Classification
Models [13.669654965671604]
We show how noisy annotations (e.g., from different algorithm-based labelers) can be utilized together and mutually benefit the learning of classification tasks.
A meta-training based label-sampling module is designed to attend the labels that benefit the model learning the most through additional back-propagation processes.
arXiv Detail & Related papers (2020-09-22T05:38:44Z) - CheXbert: Combining Automatic Labelers and Expert Annotations for
Accurate Radiology Report Labeling Using BERT [6.458158112222296]
We introduce a BERT-based approach to medical image report labeling.
We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler.
We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance.
arXiv Detail & Related papers (2020-04-20T09:46:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.