Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction
- URL: http://arxiv.org/abs/2407.00129v1
- Date: Fri, 28 Jun 2024 06:38:58 GMT
- Title: Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction
- Authors: Akash Awasthi, Ngan Le, Zhigang Deng, Rishi Agrawal, Carol C. Wu, Hien Van Nguyen,
- Abstract summary: Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images.
Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community.
Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions.
- Score: 10.388541520456714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting human gaze behavior within computer vision is integral for developing interactive systems that can anticipate user attention, address fundamental questions in cognitive science, and hold implications for fields like human-computer interaction (HCI) and augmented/virtual reality (AR/VR) systems. Despite methodologies introduced for modeling human eye gaze behavior, applying these models to medical imaging for scanpath prediction remains unexplored. Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images, potentially streamlining data collection and enhancing AI systems using larger datasets. However, predicting human scanpaths on medical images presents unique challenges due to the diverse nature of abnormal regions. Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community. Utilizing a two-stage training process and large publicly available datasets, our approach generates static heatmaps and eye gaze videos aligned with radiology reports, facilitating comprehensive analysis. We validate our approach by comparing its performance with state-of-the-art methods and assessing its generalizability among different radiologists, introducing novel strategies to model radiologists' search patterns during CXR image diagnosis. Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions over the CXR images. It sometimes also outperforms humans in terms of redundancy and randomness in the scanpaths.
Related papers
- GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph [32.1234295417225]
We propose a context-aware Gaze EstiMation (GEM) network that utilizes eye gaze data collected from radiologists to simulate their visual search behavior patterns.
It consists of a context-awareness module, visual behavior graph construction, and visual behavior matching.
Experiments on four publicly available datasets demonstrate the superiority of GEM over existing methods.
arXiv Detail & Related papers (2024-08-10T09:46:25Z) - Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns [7.6599164274971026]
Vision-Language Models (VLMs) enhanced with radiologists' attention by incorporating eye gaze data alongside textual prompts.
Heatmaps generated from eye gaze data, overlaying them onto medical images to highlight areas of intense radiologist's focus.
Results demonstrate the inclusion of eye gaze information significantly enhances the accuracy of chest X-ray analysis.
arXiv Detail & Related papers (2024-04-03T00:09:05Z) - Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis [61.089776864520594]
We propose eye-tracking as an alternative to text reports for medical images.
By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning.
We introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks.
arXiv Detail & Related papers (2023-12-11T02:27:45Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Vision-Language Generative Model for View-Specific Chest X-ray Generation [18.347723213970696]
ViewXGen is designed to overcome the limitations of existing methods to generate frontal-view chest X-rays.
Our approach takes into consideration the diverse view positions found in the dataset, enabling the generation of chest X-rays with specific views.
arXiv Detail & Related papers (2023-02-23T17:13:25Z) - Improving Chest X-Ray Classification by RNN-based Patient Monitoring [0.34998703934432673]
We analyze how information about diagnosis can improve CNN-based image classification models.
We show that a model trained on additional patient history information outperforms a model trained without the information by a significant margin.
arXiv Detail & Related papers (2022-10-28T11:47:15Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Graph representation forecasting of patient's medical conditions:
towards a digital twin [0.0]
We show the results of the investigation of pathological effects of overexpression of ACE2 across different signalling pathways in multiple tissues on cardiovascular functions.
We provide a proof of concept of integrating a large set of composable clinical models using molecular data.
arXiv Detail & Related papers (2020-09-17T13:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.