Related papers: A Neuromorphic Proto-Object Based Dynamic Visual Saliency Model with an FPGA Implementation

A Neuromorphic Proto-Object Based Dynamic Visual Saliency Model with an FPGA Implementation

URL: http://arxiv.org/abs/2002.11898v3
Date: Sun, 12 Apr 2020 02:04:47 GMT
Title: A Neuromorphic Proto-Object Based Dynamic Visual Saliency Model with an FPGA Implementation
Authors: Jamal Lottier Molin, Chetan Singh Thakur, Ralph Etienne-Cummings, Ernst Niebur
Abstract summary: We present a neuromorphic, bottom-up, dynamic visual saliency model based on the notion of proto-objects. This model outperforms state-of-the-art dynamic visual saliency models in predicting human eye fixations on a commonly used video dataset. We introduce a Field-Programmable Gate Array implementation of the model on an Opal Kelly 7350 Kintex-7 board.
Score: 1.2387676601792899
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to attend to salient regions of a visual scene is an innate and necessary preprocessing step for both biological and engineered systems performing high-level visual tasks (e.g. object detection, tracking, and classification). Computational efficiency, in regard to processing bandwidth and speed, is improved by only devoting computational resources to salient regions of the visual stimuli. In this paper, we first present a neuromorphic, bottom-up, dynamic visual saliency model based on the notion of proto-objects. This is achieved by incorporating the temporal characteristics of the visual stimulus into the model, similarly to the manner in which early stages of the human visual system extracts temporal information. This neuromorphic model outperforms state-of-the-art dynamic visual saliency models in predicting human eye fixations on a commonly used video dataset with associated eye tracking data. Secondly, for this model to have practical applications, it must be capable of performing its computations in real-time under low-power, small-size, and lightweight constraints. To address this, we introduce a Field-Programmable Gate Array implementation of the model on an Opal Kelly 7350 Kintex-7 board. This novel hardware implementation allows for processing of up to 23.35 frames per second running on a 100 MHz clock - better than 26x speedup from the software implementation.

Related papers

DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision. We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z)
Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream [3.4526439922541705]
We evaluate scaling laws for modeling the primate visual ventral stream (VVS) We observe that while behavioral alignment continues to scale with larger models, neural alignment saturates. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment.
arXiv Detail & Related papers (2024-11-08T17:13:53Z)
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z)
pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System [0.716879432974126]
We introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the lateral geniculate nucleus (LGN) area using a trained shallow convolutional model. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
arXiv Detail & Related papers (2024-09-20T16:33:01Z)
D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures. Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z)
PerSival: Neural-network-based visualisation for pervasive continuum-mechanical simulations in musculoskeletal biomechanics [1.4272256806865107]
This paper presents a novel neural network architecture for pervasive visualisation of a 3D human upper limb musculoskeletal system model. We use a sparse grid surrogate to capture the surface deformation of the m.biceps brachii in order to train a deep learning model, used for real-time visualisation of the same muscle.
arXiv Detail & Related papers (2023-12-07T00:07:35Z)
Real-time volumetric rendering of dynamic humans [83.08068677139822]
We present a method for fast 3D reconstruction and real-time rendering of dynamic humans from monocular videos. Our method can reconstruct a dynamic human in less than 3h using a single GPU, compared to recent state-of-the-art alternatives that take up to 72h. A novel local ray marching rendering allows visualizing the neural human on a mobile VR device at 40 frames per second with minimal loss of visual quality.
arXiv Detail & Related papers (2023-03-21T14:41:25Z)
Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z)
Activity Detection in Long Surgical Videos using Spatio-Temporal Models [1.2400116527089995]
In this paper, we investigate both the state-of-the-art activity recognition and temporal models. We benchmark these models on a large-scale activity recognition dataset in the operating room with over 800 full-length surgical videos. We show that even in the case of limited labeled data, we can outperform the existing work by benefiting from models pre-trained on other tasks.
arXiv Detail & Related papers (2022-05-05T17:34:33Z)
Real-time Neural-MPC: Deep Learning Model Predictive Control for Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline. We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
Emergent Properties of Foveated Perceptual Systems [3.3504365823045044]
This work is inspired by the foveated human visual system, which has higher acuity at the center of gaze and texture-like encoding in the periphery. We introduce models consisting of a first-stage textitfixed image transform followed by a second-stage textitlearnable convolutional neural network. We find that foveation with peripheral texture-based computations yields an efficient, distinct, and robust representational format of scene information.
arXiv Detail & Related papers (2020-06-14T19:34:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.