A Psychophysically Oriented Saliency Map Prediction Model
- URL: http://arxiv.org/abs/2011.04076v13
- Date: Mon, 14 Jun 2021 20:45:46 GMT
- Title: A Psychophysically Oriented Saliency Map Prediction Model
- Authors: Qiang Li
- Abstract summary: We propose a new psychophysical saliency prediction architecture, WECSF, inspired by multi-channel model of visual cortex functioning in humans.
The proposed model is evaluated using several datasets, including the MIT1003, MIT300, Toronto, SID4VAM, and UCF Sports datasets.
Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos.
- Score: 4.884688557957589
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual attention is one of the most significant characteristics for selecting
and understanding the outside redundancy world. The human vision system cannot
process all information simultaneously due to the visual information
bottleneck. In order to reduce the redundant input of visual information, the
human visual system mainly focuses on dominant parts of scenes. This is
commonly known as visual saliency map prediction. This paper proposed a new
psychophysical saliency prediction architecture, WECSF, inspired by
multi-channel model of visual cortex functioning in humans. The model consists
of opponent color channels, wavelet transform, wavelet energy map, and contrast
sensitivity function for extracting low-level image features and providing a
maximum approximation to the human visual system. The proposed model is
evaluated using several datasets, including the MIT1003, MIT300, TORONTO,
SID4VAM, and UCF Sports datasets. We also quantitatively and qualitatively
compare the saliency prediction performance with that of other state-of-the-art
models. Our model achieved strongly stable and better performance with
different metrics on natural images, psychophysical synthetic images and
dynamic videos. Additionally, we found that Fourier and spectral-inspired
saliency prediction models outperformed other state-of-the-art non-neural
network and even deep neural network models on psychophysical synthetic images.
It can be explained and supported by the Fourier Vision Hypothesis. In the
meantime, we suggest that deep neural networks need specific architectures and
goals to be able to predict salient performance on psychophysical synthetic
images better and more reliably. Finally, the proposed model could be used as a
computational model of primate vision system and help us understand mechanism
of primate vision system.
Related papers
- Modeling the Human Visual System: Comparative Insights from Response-Optimized and Task-Optimized Vision Models, Language Models, and different Readout Mechanisms [1.515687944002438]
We show that response-optimized models with visual inputs offer superior prediction accuracy for early to mid-level visual areas.
We identify three distinct regions in the visual cortex that are sensitive to perceptual features of the input that are not captured by linguistic descriptions.
We propose a novel scheme that modulates receptive fields and feature maps based on semantic content, resulting in an accuracy boost of 3-23% over existing SOTAs.
arXiv Detail & Related papers (2024-10-17T21:11:13Z) - pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System [0.716879432974126]
We introduce a deep convolutional model that closely approximates human visual information processing.
We aim to approximate the function for the lateral geniculate nucleus (LGN) area using a trained shallow convolutional model.
The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
arXiv Detail & Related papers (2024-09-20T16:33:01Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Unidirectional brain-computer interface: Artificial neural network
encoding natural images to fMRI response in the visual cortex [12.1427193917406]
We propose an artificial neural network dubbed VISION to mimic the human brain and show how it can foster neuroscientific inquiries.
VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%.
arXiv Detail & Related papers (2023-09-26T15:38:26Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - Adapting Brain-Like Neural Networks for Modeling Cortical Visual
Prostheses [68.96380145211093]
Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons.
Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge.
We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system.
arXiv Detail & Related papers (2022-09-27T17:33:19Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises [7.689542442882423]
We designed a dual-stream vision model inspired by the human brain.
This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation.
We evaluated this model against various benchmarks in terms of object recognition, gaze behavior and adversarial robustness.
arXiv Detail & Related papers (2022-06-15T03:44:42Z) - Neural Implicit Representations for Physical Parameter Inference from a Single Video [49.766574469284485]
We propose to combine neural implicit representations for appearance modeling with neural ordinary differential equations (ODEs) for modelling physical phenomena.
Our proposed model combines several unique advantages: (i) Contrary to existing approaches that require large training datasets, we are able to identify physical parameters from only a single video.
The use of neural implicit representations enables the processing of high-resolution videos and the synthesis of photo-realistic images.
arXiv Detail & Related papers (2022-04-29T11:55:35Z) - Emergent Properties of Foveated Perceptual Systems [3.3504365823045044]
This work is inspired by the foveated human visual system, which has higher acuity at the center of gaze and texture-like encoding in the periphery.
We introduce models consisting of a first-stage textitfixed image transform followed by a second-stage textitlearnable convolutional neural network.
We find that foveation with peripheral texture-based computations yields an efficient, distinct, and robust representational format of scene information.
arXiv Detail & Related papers (2020-06-14T19:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.