Understanding Character Recognition using Visual Explanations Derived
from the Human Visual System and Deep Networks
- URL: http://arxiv.org/abs/2108.04558v1
- Date: Tue, 10 Aug 2021 10:09:37 GMT
- Title: Understanding Character Recognition using Visual Explanations Derived
from the Human Visual System and Deep Networks
- Authors: Chetan Ralekar, Shubham Choudhary, Tapan Kumar Gandhi, Santanu
Chaudhury
- Abstract summary: We examine the congruence, or lack thereof, in the information-gathering strategies of deep neural networks.
The deep learning model considered similar regions in character, which humans have fixated in the case of correctly classified characters.
We propose to use the visual fixation maps obtained from the eye-tracking experiment as a supervisory input to align the model's focus on relevant character regions.
- Score: 6.734853055176694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human observers engage in selective information uptake when classifying
visual patterns. The same is true of deep neural networks, which currently
constitute the best performing artificial vision systems. Our goal is to
examine the congruence, or lack thereof, in the information-gathering
strategies of the two systems. We have operationalized our investigation as a
character recognition task. We have used eye-tracking to assay the spatial
distribution of information hotspots for humans via fixation maps and an
activation mapping technique for obtaining analogous distributions for deep
networks through visualization maps. Qualitative comparison between
visualization maps and fixation maps reveals an interesting correlate of
congruence. The deep learning model considered similar regions in character,
which humans have fixated in the case of correctly classified characters. On
the other hand, when the focused regions are different for humans and deep
nets, the characters are typically misclassified by the latter. Hence, we
propose to use the visual fixation maps obtained from the eye-tracking
experiment as a supervisory input to align the model's focus on relevant
character regions. We find that such supervision improves the model's
performance significantly and does not require any additional parameters. This
approach has the potential to find applications in diverse domains such as
medical analysis and surveillance in which explainability helps to determine
system fidelity.
Related papers
- A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Comparing object recognition in humans and deep convolutional neural
networks -- An eye tracking study [7.222232547612573]
Deep convolutional neural networks (DCNNs) and the ventral visual pathway share vast architectural and functional similarities.
We demonstrate a comparison of human observers (N = 45) and three feedforward DCNNs through eye tracking and saliency maps.
A DCNN with biologically plausible receptive field sizes called vNet reveals higher agreement with human viewing behavior as contrasted with a standard ResNet architecture.
arXiv Detail & Related papers (2021-07-30T23:32:05Z) - Passive attention in artificial neural networks predicts human visual
selectivity [8.50463394182796]
We show that passive attention techniques reveal a significant overlap with human visual selectivity estimates.
We validate these correlational results with causal manipulations using recognition experiments.
This work contributes a new approach to evaluating the biological and psychological validity of leading ANNs as models of human vision.
arXiv Detail & Related papers (2021-07-14T21:21:48Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Classifying Eye-Tracking Data Using Saliency Maps [8.524684315458245]
This paper proposes a visual saliency based novel feature extraction method for automatic and quantitative classification of eye-tracking data.
Comparing the saliency amplitudes, similarity and dissimilarity of saliency maps with the corresponding eye fixations maps gives an extra dimension of information which is effectively utilized to generate discriminative features to classify the eye-tracking data.
arXiv Detail & Related papers (2020-10-24T15:18:07Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z) - Supervision and Source Domain Impact on Representation Learning: A
Histopathology Case Study [6.762603053858596]
In this work, we explored the performance of a deep neural network and triplet loss in the area of representation learning.
We investigated the notion of similarity and dissimilarity in pathology whole-slide images and compared different setups from unsupervised and semi-supervised to supervised learning.
We achieved high accuracy and generalization when the learned representations were applied to two different pathology datasets.
arXiv Detail & Related papers (2020-05-10T21:27:38Z) - Structured Landmark Detection via Topology-Adapting Deep Graph Learning [75.20602712947016]
We present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical landmark detection.
The proposed method constructs graph signals leveraging both local image features and global shape features.
Experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis)
arXiv Detail & Related papers (2020-04-17T11:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.