Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning
- URL: http://arxiv.org/abs/2205.12466v1
- Date: Wed, 25 May 2022 03:29:10 GMT
- Title: Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning
- Authors: Chong Ma, Lin Zhao, Yuzhong Chen, Lu Zhang, Zhenxiang Xiao, Haixing
Dai, David Liu, Zihao Wu, Zhengliang Liu, Sheng Wang, Jiaxing Gao, Changhe
Li, Xi Jiang, Tuo Zhang, Qian Wang, Dinggang Shen, Dajiang Zhu, Tianming Liu
- Abstract summary: We propose to infuse human experts' intelligence and domain knowledge into the training of deep neural networks.
We propose a novel eye-gaze-guided vision transformer (EG-ViT) for diagnosis with limited medical image data.
- Score: 42.674679049746175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning harmful shortcuts such as spurious correlations and biases prevents
deep neural networks from learning the meaningful and useful representations,
thus jeopardizing the generalizability and interpretability of the learned
representation. The situation becomes even more serious in medical imaging,
where the clinical data (e.g., MR images with pathology) are limited and scarce
while the reliability, generalizability and transparency of the learned model
are highly required. To address this problem, we propose to infuse human
experts' intelligence and domain knowledge into the training of deep neural
networks. The core idea is that we infuse the visual attention information from
expert radiologists to proactively guide the deep model to focus on regions
with potential pathology and avoid being trapped in learning harmful shortcuts.
To do so, we propose a novel eye-gaze-guided vision transformer (EG-ViT) for
diagnosis with limited medical image data. We mask the input image patches that
are out of the radiologists' interest and add an additional residual connection
in the last encoder layer of EG-ViT to maintain the correlations of all
patches. The experiments on two public datasets of INbreast and SIIM-ACR
demonstrate our EG-ViT model can effectively learn/transfer experts' domain
knowledge and achieve much better performance than baselines. Meanwhile, it
successfully rectifies the harmful shortcut learning and significantly improves
the EG-ViT model's interpretability. In general, EG-ViT takes the advantages of
both human expert's prior knowledge and the power of deep neural networks. This
work opens new avenues for advancing current artificial intelligence paradigms
by infusing human intelligence.
Related papers
- Adversarial Neural Networks in Medical Imaging Advancements and Challenges in Semantic Segmentation [6.88255677115486]
Recent advancements in artificial intelligence (AI) have precipitated a paradigm shift in medical imaging.
This paper systematically investigates the integration of deep learning -- a principal branch of AI -- into the semantic segmentation of brain images.
adversarial neural networks, a novel AI approach that not only automates but also refines the semantic segmentation process.
arXiv Detail & Related papers (2024-10-17T00:05:05Z) - Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image [6.31072075551707]
We propose a novel gaze-directed Vision GNN (called GD-ViG) to leverage the visual patterns of radiologists from gaze as expert knowledge.
The experiments on two public medical image datasets demonstrate that GD-ViG outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-06-20T07:16:41Z) - Towards Generalization in Subitizing with Neuro-Symbolic Loss using
Holographic Reduced Representations [49.22640185566807]
We show that adapting tools used in CogSci research can improve the subitizing generalization of CNNs and ViTs.
We investigate how this neuro-symbolic approach to learning affects the subitizing capability of CNNs and ViTs.
We find that ViTs perform considerably worse compared to CNNs in most respects on subitizing, except on one axis where an HRR-based loss provides improvement.
arXiv Detail & Related papers (2023-12-23T17:54:03Z) - Evaluating the structure of cognitive tasks with transfer learning [67.22168759751541]
This study investigates the transferability of deep learning representations between different EEG decoding tasks.
We conduct extensive experiments using state-of-the-art decoding models on two recently released EEG datasets.
arXiv Detail & Related papers (2023-07-28T14:51:09Z) - Improving Clinician Performance in Classification of EEG Patterns on the Ictal-Interictal-Injury Continuum using Interpretable Machine Learning [15.548202338334615]
In intensive care units (ICUs), critically ill patients are monitored with electroencephalograms (EEGs) to prevent serious brain injury.
Black box deep learning models are untrustworthy, difficult to troubleshoot, and lack accountability in real-world applications.
We propose a novel interpretable deep learning model that predicts the presence of harmful brainwave patterns.
arXiv Detail & Related papers (2022-11-09T21:33:40Z) - Adapting Brain-Like Neural Networks for Modeling Cortical Visual
Prostheses [68.96380145211093]
Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons.
Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge.
We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system.
arXiv Detail & Related papers (2022-09-27T17:33:19Z) - Visual Interpretable and Explainable Deep Learning Models for Brain
Tumor MRI and COVID-19 Chest X-ray Images [0.0]
We evaluate attribution methods for illuminating how deep neural networks analyze medical images.
We attribute predictions from brain tumor MRI and COVID-19 chest X-ray datasets made by recent deep convolutional neural network models.
arXiv Detail & Related papers (2022-08-01T16:05:14Z) - Rectify ViT Shortcut Learning by Visual Saliency [40.55418820114868]
Shortcut learning is common but harmful to deep learning models.
In this work, we propose a novel and effective saliency-guided vision transformer (SGT) model to rectify shortcut learning.
arXiv Detail & Related papers (2022-06-17T05:54:07Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.