GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph
- URL: http://arxiv.org/abs/2408.05502v1
- Date: Sat, 10 Aug 2024 09:46:25 GMT
- Title: GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph
- Authors: Shaonan Liu, Wenting Chen, Jie Liu, Xiaoling Luo, Linlin Shen,
- Abstract summary: We propose a context-aware Gaze EstiMation (GEM) network that utilizes eye gaze data collected from radiologists to simulate their visual search behavior patterns.
It consists of a context-awareness module, visual behavior graph construction, and visual behavior matching.
Experiments on four publicly available datasets demonstrate the superiority of GEM over existing methods.
- Score: 32.1234295417225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaze estimation is pivotal in human scene comprehension tasks, particularly in medical diagnostic analysis. Eye-tracking technology facilitates the recording of physicians' ocular movements during image interpretation, thereby elucidating their visual attention patterns and information-processing strategies. In this paper, we initially define the context-aware gaze estimation problem in medical radiology report settings. To understand the attention allocation and cognitive behavior of radiologists during the medical image interpretation process, we propose a context-aware Gaze EstiMation (GEM) network that utilizes eye gaze data collected from radiologists to simulate their visual search behavior patterns throughout the image interpretation process. It consists of a context-awareness module, visual behavior graph construction, and visual behavior matching. Within the context-awareness module, we achieve intricate multimodal registration by establishing connections between medical reports and images. Subsequently, for a more accurate simulation of genuine visual search behavior patterns, we introduce a visual behavior graph structure, capturing such behavior through high-order relationships (edges) between gaze points (nodes). To maintain the authenticity of visual behavior, we devise a visual behavior-matching approach, adjusting the high-order relationships between them by matching the graph constructed from real and estimated gaze points. Extensive experiments on four publicly available datasets demonstrate the superiority of GEM over existing methods and its strong generalizability, which also provides a new direction for the effective utilization of diverse modalities in medical image interpretation and enhances the interpretability of models in the field of medical imaging. https://github.com/Tiger-SN/GEM
Related papers
- Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Visual Neural Decoding via Improved Visual-EEG Semantic Consistency [3.4061238650474657]
Methods that directly map EEG features to the CLIP embedding space may introduce mapping bias and cause semantic inconsistency.
We propose a Visual-EEG Semantic Decouple Framework that explicitly extracts the semantic-related features of these two modalities to facilitate optimal alignment.
Our method achieves state-of-the-art results in zero-shot neural decoding tasks.
arXiv Detail & Related papers (2024-08-13T10:16:10Z) - Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction [10.388541520456714]
Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images.
Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community.
Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions.
arXiv Detail & Related papers (2024-06-28T06:38:58Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Anatomical Structure-Guided Medical Vision-Language Pre-training [21.68719061251635]
We propose an Anatomical Structure-Guided (ASG) framework for learning medical visual representations.
For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists.
For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample.
arXiv Detail & Related papers (2024-03-14T11:29:47Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis [61.089776864520594]
We propose eye-tracking as an alternative to text reports for medical images.
By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning.
We introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks.
arXiv Detail & Related papers (2023-12-11T02:27:45Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.