Competence-based Multimodal Curriculum Learning for Medical Report
Generation
- URL: http://arxiv.org/abs/2206.14579v3
- Date: Tue, 11 Apr 2023 06:23:39 GMT
- Title: Competence-based Multimodal Curriculum Learning for Medical Report
Generation
- Authors: Fenglin Liu, Shen Ge, Yuexian Zou, Xian Wu
- Abstract summary: We propose a Competence-based Multimodal Curriculum Learning framework ( CMCL) to alleviate the data bias and make best use of available data.
Specifically, CMCL simulates the learning process of radiologists and optimize the model in a step by step manner.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.
- Score: 98.10763792453925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical report generation task, which targets to produce long and coherent
descriptions of medical images, has attracted growing research interests
recently. Different from the general image captioning tasks, medical report
generation is more challenging for data-driven neural models. This is mainly
due to 1) the serious data bias and 2) the limited medical data. To alleviate
the data bias and make best use of available data, we propose a
Competence-based Multimodal Curriculum Learning framework (CMCL). Specifically,
CMCL simulates the learning process of radiologists and optimizes the model in
a step by step manner. Firstly, CMCL estimates the difficulty of each training
instance and evaluates the competence of current model; Secondly, CMCL selects
the most suitable batch of training instances considering current model
competence. By iterating above two steps, CMCL can gradually improve the
model's performance. The experiments on the public IU-Xray and MIMIC-CXR
datasets show that CMCL can be incorporated into existing models to improve
their performance.
Related papers
- Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML [0.7982607013768545]
Yet Another ICU Benchmark (YAIB) is a modular framework that allows researchers to define reproducible and comparable clinical ML experiments.
YAIB supports most open-access ICU datasets (MIMIC III/IV, eICU, HiRID, AUMCdb) and is easily adaptable to future ICU datasets.
We demonstrate that the choice of dataset, cohort definition, and preprocessing have a major impact on the prediction performance.
arXiv Detail & Related papers (2023-06-08T11:16:20Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - medigan: A Python Library of Pretrained Generative Models for Enriched
Data Access in Medical Imaging [3.8568465270960264]
medigan is a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library.
It allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code.
The library's scalability and design is demonstrated by its growing number of integrated and readily-usable pretrained generative models.
arXiv Detail & Related papers (2022-09-28T23:45:33Z) - Understanding the Tricks of Deep Learning in Medical Image Segmentation:
Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases.
We experimentally explore the effectiveness of these tricks on consistent baselines.
We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z) - Density-Aware Personalized Training for Risk Prediction in Imbalanced
Medical Data [89.79617468457393]
Training models with imbalance rate (class density discrepancy) may lead to suboptimal prediction.
We propose a framework for training models for this imbalance issue.
We demonstrate our model's improved performance in real-world medical datasets.
arXiv Detail & Related papers (2022-07-23T00:39:53Z) - MetaVA: Curriculum Meta-learning and Pre-fine-tuning of Deep Neural
Networks for Detecting Ventricular Arrhythmias based on ECGs [9.600976281032862]
Ventricular arrhythmias (VA) are the main causes of sudden cardiac death.
We propose a novel model agnostic meta-learning (MAML) with curriculum learning (CL) method to solve group-level diversity.
We conduct experiments using a combination of three publicly available ECG datasets.
arXiv Detail & Related papers (2022-02-25T01:26:19Z) - Knowledge Distillation for Brain Tumor Segmentation [0.0]
We study the relationship between the performance of the model and the amount of data employed during the training process.
A single model trained with additional data achieves performance close to the ensemble of multiple models and outperforms individual methods.
arXiv Detail & Related papers (2020-02-10T12:44:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.