High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers
- URL: http://arxiv.org/abs/2505.01693v1
- Date: Sat, 03 May 2025 04:50:55 GMT
- Title: High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers
- Authors: Brian Wong, Kaito Tanaka,
- Abstract summary: DeBERTa-RAD is a novel framework that combines the power of state-of-the-art LLM pseudo-labeling with efficient DeBERTa-based knowledge distillation for accurate and fast chest X-ray report labeling.<n> evaluated on the expert-annotated MIMIC-500 benchmark, DeBERTa-RAD achieves a state-of-the-art Macro F1 score of 0.9120.
- Score: 0.2158126716116375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated labeling of chest X-ray reports is essential for enabling downstream tasks such as training image-based diagnostic models, population health studies, and clinical decision support. However, the high variability, complexity, and prevalence of negation and uncertainty in these free-text reports pose significant challenges for traditional Natural Language Processing methods. While large language models (LLMs) demonstrate strong text understanding, their direct application for large-scale, efficient labeling is limited by computational cost and speed. This paper introduces DeBERTa-RAD, a novel two-stage framework that combines the power of state-of-the-art LLM pseudo-labeling with efficient DeBERTa-based knowledge distillation for accurate and fast chest X-ray report labeling. We leverage an advanced LLM to generate high-quality pseudo-labels, including certainty statuses, for a large corpus of reports. Subsequently, a DeBERTa-Base model is trained on this pseudo-labeled data using a tailored knowledge distillation strategy. Evaluated on the expert-annotated MIMIC-500 benchmark, DeBERTa-RAD achieves a state-of-the-art Macro F1 score of 0.9120, significantly outperforming established rule-based systems, fine-tuned transformer models, and direct LLM inference, while maintaining a practical inference speed suitable for high-throughput applications. Our analysis shows particular strength in handling uncertain findings. This work demonstrates a promising path to overcome data annotation bottlenecks and achieve high-performance medical text processing through the strategic combination of LLM capabilities and efficient student models trained via distillation.
Related papers
- Zero-Shot Document-Level Biomedical Relation Extraction via Scenario-based Prompt Design in Two-Stage with LLM [7.808231572590279]
We propose a novel approach to achieve the same results from unannotated full documents using general large language models (LLMs) with lower hardware and labor costs.<n>Our approach combines two major stages: named entity recognition (NER) and relation extraction (RE)<n>To enhance the effectiveness of prompt, we propose a five-part template structure and a scenario-based prompt design principles.
arXiv Detail & Related papers (2025-05-02T07:33:20Z) - Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation [20.173287130474797]
generative medical Vision Large Language Models (VLLMs) are prone to hallucinations and can produce inaccurate diagnostic information.<n>We introduce a novel Semantic Consistency-Based Uncertainty Quantification framework that provides both report-level and sentence-level uncertainties.<n>Our approach improves factuality scores by $10$%, achieved by rejecting $20$% of reports using the textttRadialog model on the MIMIC-CXR dataset.
arXiv Detail & Related papers (2024-12-05T20:43:39Z) - Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.<n>LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.<n>Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Self-training Large Language Models through Knowledge Detection [26.831873737733737]
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks.
This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples.
Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects.
arXiv Detail & Related papers (2024-06-17T07:25:09Z) - An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models [55.01592097059969]
Supervised finetuning on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities.
Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool.
We propose using experimental design to circumvent the computational bottlenecks of active learning.
arXiv Detail & Related papers (2024-01-12T16:56:54Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Automated Labeling of German Chest X-Ray Radiology Reports using Deep
Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model.
Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.