Multimodal self-supervised learning for lesion localization
- URL: http://arxiv.org/abs/2401.01524v3
- Date: Tue, 20 Aug 2024 03:49:28 GMT
- Title: Multimodal self-supervised learning for lesion localization
- Authors: Hao Yang, Hong-Yu Zhou, Cheng Li, Weijian Huang, Jiarun Liu, Yong Liang, Guangming Shi, Hairong Zheng, Qiegen Liu, Shanshan Wang,
- Abstract summary: A new method is introduced that takes full sentences from textual reports as the basic units for local semantic alignment.
This approach combines chest X-ray images with their corresponding textual reports, performing contrastive learning at both global and local levels.
- Score: 41.7046184109176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal deep learning utilizing imaging and diagnostic reports has made impressive progress in the field of medical imaging diagnostics, demonstrating a particularly strong capability for auxiliary diagnosis in cases where sufficient annotation information is lacking. Nonetheless, localizing diseases accurately without detailed positional annotations remains a challenge. Although existing methods have attempted to utilize local information to achieve fine-grained semantic alignment, their capability in extracting the fine-grained semantics of the comprehensive context within reports is limited. To address this problem, a new method is introduced that takes full sentences from textual reports as the basic units for local semantic alignment. This approach combines chest X-ray images with their corresponding textual reports, performing contrastive learning at both global and local levels. The leading results obtained by this method on multiple datasets confirm its efficacy in the task of lesion localization.
Related papers
- SGSeg: Enabling Text-free Inference in Language-guided Segmentation of Chest X-rays via Self-guidance [10.075820470715374]
We propose a self-guided segmentation framework (SGSeg) that leverages language guidance for training (multi-modal) while enabling text-free inference (uni-modal)
We exploit the critical location information of both pulmonary and pathological structures depicted in the text reports and introduce a novel localization-enhanced report generation (LERG) module to generate clinical reports for self-guidance.
Our LERG integrates an object detector and a location-based attention aggregator, weakly-supervised by a location-aware pseudo-label extraction module.
arXiv Detail & Related papers (2024-09-07T08:16:00Z) - Large Multimodal Model based Standardisation of Pathology Reports with Confidence and their Prognostic Significance [4.777807873917223]
We present a practical approach based on the use of large multimodal models (LMMs) for automatically extracting information from scanned images of pathology reports.
The proposed framework uses two stages of prompting a Large Multimodal Model (LMM) for information extraction and validation.
We show that the estimated confidence is an effective indicator of the accuracy of the extracted information that can be used to select only accurately extracted fields.
arXiv Detail & Related papers (2024-05-03T12:19:38Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - PCA: Semi-supervised Segmentation with Patch Confidence Adversarial
Training [52.895952593202054]
We propose a new semi-supervised adversarial method called Patch Confidence Adrial Training (PCA) for medical image segmentation.
PCA learns the pixel structure and context information in each patch to get enough gradient feedback, which aids the discriminator in convergent to an optimal state.
Our method outperforms the state-of-the-art semi-supervised methods, which demonstrates its effectiveness for medical image segmentation.
arXiv Detail & Related papers (2022-07-24T07:45:47Z) - Voxel-wise Adversarial Semi-supervised Learning for Medical Image
Segmentation [4.489713477369384]
We introduce a novel adversarial learning-based semi-supervised segmentation method for medical image segmentation.
Our method embeds both local and global features from multiple hidden layers and learns context relations between multiple classes.
Our method outperforms current best-performing state-of-the-art semi-supervised learning approaches on the image segmentation of the left atrium (single class) and multiorgan datasets (multiclass)
arXiv Detail & Related papers (2022-05-14T06:57:19Z) - Radiology Report Generation with a Learned Knowledge Base and
Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray.
Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment.
With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice.
Recent work has shown that deep learning models can successfully caption natural images.
We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z) - Belief function-based semi-supervised learning for brain tumor
segmentation [23.21410263735263]
Deep learning makes it possible to detect and segment a lesion field using annotated data.
However, obtaining precisely annotated data is very challenging in the medical domain.
In this paper, we address the uncertain boundary problem by a new evidential neural network with an information fusion strategy.
arXiv Detail & Related papers (2021-01-29T22:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.