Leveraging Human Selective Attention for Medical Image Analysis with
Limited Training Data
- URL: http://arxiv.org/abs/2112.01034v1
- Date: Thu, 2 Dec 2021 07:55:25 GMT
- Title: Leveraging Human Selective Attention for Medical Image Analysis with
Limited Training Data
- Authors: Yifei Huang and Xiaoxiao Li and Lijin Yang and Lin Gu and Yingying Zhu
and Hirofumi Seo and Qiuming Meng and Tatsuya Harada and Yoichi Sato
- Abstract summary: The selective attention mechanism helps the cognition system focus on task-relevant visual clues by ignoring the presence of distractors.
We propose a framework to leverage gaze for medical image analysis tasks with small training data.
Our method is demonstrated to achieve superior performance on both 3D tumor segmentation and 2D chest X-ray classification tasks.
- Score: 72.1187887376849
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The human gaze is a cost-efficient physiological data that reveals human
underlying attentional patterns. The selective attention mechanism helps the
cognition system focus on task-relevant visual clues by ignoring the presence
of distractors. Thanks to this ability, human beings can efficiently learn from
a very limited number of training samples. Inspired by this mechanism, we aim
to leverage gaze for medical image analysis tasks with small training data. Our
proposed framework includes a backbone encoder and a Selective Attention
Network (SAN) that simulates the underlying attention. The SAN implicitly
encodes information such as suspicious regions that is relevant to the medical
diagnose tasks by estimating the actual human gaze. Then we design a novel
Auxiliary Attention Block (AAB) to allow information from SAN to be utilized by
the backbone encoder to focus on selective areas. Specifically, this block uses
a modified version of a multi-head attention layer to simulate the human visual
search procedure. Note that the SAN and AAB can be plugged into different
backbones, and the framework can be used for multiple medical image analysis
tasks when equipped with task-specific heads. Our method is demonstrated to
achieve superior performance on both 3D tumor segmentation and 2D chest X-ray
classification tasks. We also show that the estimated gaze probability map of
the SAN is consistent with an actual gaze fixation map obtained by
board-certified doctors.
Related papers
- Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability [2.64700310378485]
We introduce a novel deep-learning framework for joint disease diagnosis and prediction of corresponding visual saliency maps for chest X-ray scans.
Specifically, we designed a novel dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder.
Experiments show that our proposed method outperformed existing techniques for chest X-ray diagnosis and the quality of visual saliency map prediction.
arXiv Detail & Related papers (2024-03-25T17:31:12Z) - Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data.
The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z) - Exploiting the Brain's Network Structure for Automatic Identification of
ADHD Subjects [70.37277191524755]
We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from control subjects.
We train our classifier with 776 subjects and test on 171 subjects provided by The Neuro Bureau for the ADHD-200 challenge.
arXiv Detail & Related papers (2023-06-15T16:22:57Z) - DrasCLR: A Self-supervised Framework of Learning Disease-related and
Anatomy-specific Representation for 3D Medical Images [23.354686734545176]
We present a novel SSL framework, named DrasCLR, for 3D medical imaging.
We propose two domain-specific contrastive learning strategies: one aims to capture subtle disease patterns inside a local anatomical region, and the other aims to represent severe disease patterns that span larger regions.
arXiv Detail & Related papers (2023-02-21T01:32:27Z) - Follow My Eye: Using Gaze to Supervise Computer-Aided Diagnosis [54.60796004113496]
We demonstrate that the eye movement of radiologists reading medical images can be a new form of supervision to train the DNN-based computer-aided diagnosis (CAD) system.
We record the tracks of the radiologists' gaze when they are reading images.
The gaze information is processed and then used to supervise the DNN's attention via an Attention Consistency module.
arXiv Detail & Related papers (2022-04-06T08:31:05Z) - Multi-task UNet: Jointly Boosting Saliency Prediction and Disease
Classification on Chest X-ray Images [3.8637285238278434]
This paper describes a novel deep learning model for visual saliency prediction on chest X-ray (CXR) images.
To cope with data deficiency, we exploit the multi-task learning method and tackles disease classification on CXR simultaneously.
Experiments show our proposed deep learning model with our new learning scheme can outperform existing methods dedicated either for saliency prediction or image classification.
arXiv Detail & Related papers (2022-02-15T01:12:42Z) - Human Attention in Fine-grained Classification [38.71613202835921]
We validate that human attention contains valuable information for decision-making processes such as fine-grained classification.
We propose Gaze Augmentation Training (GAT) and Knowledge Fusion Network (KFN) to integrate human gaze into classification models.
arXiv Detail & Related papers (2021-11-02T14:41:11Z) - Non-contact Pain Recognition from Video Sequences with Remote
Physiological Measurements Prediction [53.03469655641418]
We present a novel multi-task learning framework which encodes both appearance changes and physiological cues in a non-contact manner for pain recognition.
We establish the state-of-the-art performance of non-contact pain recognition on publicly available pain databases.
arXiv Detail & Related papers (2021-05-18T20:47:45Z) - Cross-Task Representation Learning for Anatomical Landmark Detection [20.079451546446712]
We propose to regularize the knowledge transfer across source and target tasks through cross-task representation learning.
The proposed method is demonstrated for extracting facial anatomical landmarks which facilitate the diagnosis of fetal alcohol syndrome.
We present two approaches for the proposed representation learning by constraining either final or intermediate model features on the target model.
arXiv Detail & Related papers (2020-09-28T21:22:49Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.