Related papers: Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network

Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network

URL: http://arxiv.org/abs/2310.06578v1
Date: Tue, 10 Oct 2023 12:39:10 GMT
Title: Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network
Authors: Yunhui Zhou, Dongqi Han, Yuguo Yu
Abstract summary: Human vision incorporates non-uniform resolution retina, efficient eye movement strategy, and spiking neural network (SNN) to balance the requirements in visual field size, visual resolution, energy cost, and inference latency. Here, we examine human visual search behaviors and establish the first SNN-based visual search model. The model can learn either a human-like or a near-optimal fixation strategy, outperform humans in search speed and accuracy, and achieve high energy efficiency through short saccade decision latency and sparse activation.
Score: 8.380017457339756
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Human vision incorporates non-uniform resolution retina, efficient eye movement strategy, and spiking neural network (SNN) to balance the requirements in visual field size, visual resolution, energy cost, and inference latency. These properties have inspired interest in developing human-like computer vision. However, existing models haven't fully incorporated the three features of human vision, and their learned eye movement strategies haven't been compared with human's strategy, making the models' behavior difficult to interpret. Here, we carry out experiments to examine human visual search behaviors and establish the first SNN-based visual search model. The model combines an artificial retina with spiking feature extraction, memory, and saccade decision modules, and it employs population coding for fast and efficient saccade decisions. The model can learn either a human-like or a near-optimal fixation strategy, outperform humans in search speed and accuracy, and achieve high energy efficiency through short saccade decision latency and sparse activation. It also suggests that the human search strategy is suboptimal in terms of search speed. Our work connects modeling of vision in neuroscience and machine learning and sheds light on developing more energy-efficient computer vision algorithms.

Related papers

HVIS: A Human-like Vision and Inference System for Human Motion Prediction [16.315519892850077]
We propose the Human-like Vision and Inference System (HVIS) for human motion prediction. HVIS comprises two components: the human-like vision encodetemporal (HVE) module and the human-like motion inference (HMI) module. We show that our method achieves markedly new state-of-the-art performance, significantly outperforming existing methods by 19.8% on Human3.6M, 15.7% on CMU Mocap, and 11.1% on G3D.
arXiv Detail & Related papers (2025-02-24T07:18:37Z)
Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject. We show that our model captures the distinct functionalities of each region of human vision system. Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z)
Achieving More Human Brain-Like Vision via Human EEG Representational Alignment [1.811217832697894]
We present 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers. Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
arXiv Detail & Related papers (2024-01-30T18:18:41Z)
Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks. Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios. New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z)
Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task [2.092312847886424]
We build deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts. We find that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze.
arXiv Detail & Related papers (2022-12-20T15:48:48Z)
Simulating Human Gaze with Neural Visual Attention [44.65733084492857]
We propose the Neural Visual Attention (NeVA) algorithm to integrate guidance of any downstream visual task into attention modeling. We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective.
arXiv Detail & Related papers (2022-11-22T09:02:09Z)
BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity. Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z)
Adapting Brain-Like Neural Networks for Modeling Cortical Visual Prostheses [68.96380145211093]
Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons. Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge. We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system.
arXiv Detail & Related papers (2022-09-27T17:33:19Z)
Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises [7.689542442882423]
We designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior and adversarial robustness.
arXiv Detail & Related papers (2022-06-15T03:44:42Z)
Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention [40.878963450471026]
We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. We show that the proposed method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths.
arXiv Detail & Related papers (2022-04-19T18:57:47Z)
Searching the Search Space of Vision Transformer [98.96601221383209]
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection. We propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space. We provide design guidelines of general vision transformers with extensive analysis according to the space searching process.
arXiv Detail & Related papers (2021-11-29T17:26:07Z)
MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images [60.56518548286836]
To generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs. We propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images.
arXiv Detail & Related papers (2021-06-22T17:30:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.