Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis
- URL: http://arxiv.org/abs/2507.12461v1
- Date: Wed, 16 Jul 2025 17:58:35 GMT
- Title: Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis
- Authors: Trong-Thang Pham, Anh Nguyen, Zhigang Deng, Carol C. Wu, Hien Van Nguyen, Ngan Le,
- Abstract summary: Radiologists rely on eye movements to navigate and interpret medical images.<n>A trained radiologist possesses knowledge about the potential diseases that may be present in the images and, when searching, follows a mental checklist to locate them.<n>This is a key observation, yet existing models fail to capture the underlying intent behind each fixation.<n>We introduce a deep learning-based approach, RadGazeIntent, designed to model this behavior.
- Score: 13.125637740252403
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Radiologists rely on eye movements to navigate and interpret medical images. A trained radiologist possesses knowledge about the potential diseases that may be present in the images and, when searching, follows a mental checklist to locate them using their gaze. This is a key observation, yet existing models fail to capture the underlying intent behind each fixation. In this paper, we introduce a deep learning-based approach, RadGazeIntent, designed to model this behavior: having an intention to find something and actively searching for it. Our transformer-based architecture processes both the temporal and spatial dimensions of gaze data, transforming fine-grained fixation features into coarse, meaningful representations of diagnostic intent to interpret radiologists' goals. To capture the nuances of radiologists' varied intention-driven behaviors, we process existing medical eye-tracking datasets to create three intention-labeled subsets: RadSeq (Systematic Sequential Search), RadExplore (Uncertainty-driven Exploration), and RadHybrid (Hybrid Pattern). Experimental results demonstrate RadGazeIntent's ability to predict which findings radiologists are examining at specific moments, outperforming baseline methods across all intention-labeled datasets.
Related papers
- FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation [9.374812942790953]
We introduce Fine-Grained CXR dataset, which provides fine-grained paired information between the captions generated by radiologists and the corresponding gaze attention heatmaps for each anatomy.
Our analysis reveals that simply applying black-box image captioning methods to generate reports cannot adequately explain which information in CXR is utilized.
We propose a novel explainable radiologist's attention generator network (Gen-XAI) that mimics the diagnosis process of radiologists, explicitly constraining its output to closely align with both radiologist's gaze attention and transcript.
arXiv Detail & Related papers (2024-11-23T02:22:40Z) - GazeSearch: Radiology Findings Search Benchmark [9.21918773048464]
Medical eye-tracking data is an important information source for understanding how radiologists visually interpret medical images.<n>The current eye-tracking data is dispersed, unprocessed, and ambiguous, making it difficult to derive meaningful insights.<n>In this work, we propose a refinement method inspired by the target-present visual search challenge: there is a specific finding and fixations are guided to locate it.
arXiv Detail & Related papers (2024-11-08T18:47:08Z) - Multimodal Learning and Cognitive Processes in Radiology: MedGaze for Chest X-ray Scanpath Prediction [10.388541520456714]
Our proposed system aims to predict eye gaze sequences from radiology reports and CXR images.
Our model predicts fixation coordinates and durations critical for medical scanpath prediction, outperforming existing models in the computer vision community.
Based on the radiologist's evaluation, MedGaze can generate human-like gaze sequences with a high focus on relevant regions.
arXiv Detail & Related papers (2024-06-28T06:38:58Z) - Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis [61.089776864520594]
We propose eye-tracking as an alternative to text reports for medical images.
By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning.
We introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks.
arXiv Detail & Related papers (2023-12-11T02:27:45Z) - I-AI: A Controllable & Interpretable AI System for Decoding
Radiologists' Intense Focus for Accurate CXR Diagnoses [9.260958560874812]
Interpretable Artificial Intelligence (I-AI) is a novel and unified controllable interpretable pipeline.
Our I-AI addresses three key questions: where a radiologist looks, how long they focus on specific areas, and what findings they diagnose.
arXiv Detail & Related papers (2023-09-24T04:48:44Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [72.8965643836841]
We introduce XrayGPT, a novel conversational medical vision-language model.<n>It can analyze and answer open-ended questions about chest radiographs.<n>We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions.
In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases.
We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z) - Improving Radiology Summarization with Radiograph and Anatomy Prompts [60.30659124918211]
We propose a novel anatomy-enhanced multimodal model to promote impression generation.
In detail, we first construct a set of rules to extract anatomies and put these prompts into each sentence to highlight anatomy characteristics.
We utilize a contrastive learning module to align these two representations at the overall level and use a co-attention to fuse them at the sentence level.
arXiv Detail & Related papers (2022-10-15T14:05:03Z) - SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection [76.01333073259677]
We propose the use of Space-aware Memory Queues for In-painting and Detecting anomalies from radiography images (abbreviated as SQUID)
We show that SQUID can taxonomize the ingrained anatomical structures into recurrent patterns; and in the inference, it can identify anomalies (unseen/modified patterns) in the image.
arXiv Detail & Related papers (2021-11-26T13:47:34Z) - Exploring and Distilling Posterior and Prior Knowledge for Radiology
Report Generation [55.00308939833555]
The PPKED includes three modules: Posterior Knowledge Explorer (PoKE), Prior Knowledge Explorer (PrKE) and Multi-domain Knowledge Distiller (MKD)
PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias.
PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias.
arXiv Detail & Related papers (2021-06-13T11:10:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.