pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System
- URL: http://arxiv.org/abs/2409.13622v1
- Date: Fri, 20 Sep 2024 16:33:01 GMT
- Title: pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System
- Authors: Moslem Gorji, Amin Ranjbar, Mohammad Bagher Menhaj,
- Abstract summary: We introduce a deep convolutional model that closely approximates human visual information processing.
We aim to approximate the function for the lateral geniculate nucleus (LGN) area using a trained shallow convolutional model.
The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
- Score: 0.716879432974126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The visual cortex is a vital part of the brain, responsible for hierarchically identifying objects. Understanding the role of the lateral geniculate nucleus (LGN) as a prior region of the visual cortex is crucial when processing visual information in both bottom-up and top-down pathways. When visual stimuli reach the retina, they are transmitted to the LGN area for initial processing before being sent to the visual cortex for further processing. In this study, we introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the LGN area using a trained shallow convolutional model which is designed based on a pruned autoencoder (pAE) architecture. The pAE model attempts to integrate feed forward and feedback streams from/to the V1 area into the problem. This modeling framework encompasses both temporal and non-temporal data feeding modes of the visual stimuli dataset containing natural images captured by a fixed camera in consecutive frames, featuring two categories: images with animals (in motion), and images without animals. Subsequently, we compare the results of our proposed deep-tuned model with wavelet filter bank methods employing Gabor and biorthogonal wavelet functions. Our experiments reveal that the proposed method based on the deep-tuned model not only achieves results with high similarity in comparison with human benchmarks but also performs significantly better than other models. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
Related papers
- Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Top-down inference in an early visual cortex inspired hierarchical
Variational Autoencoder [0.0]
We exploit advances in Variational Autoencoders to investigate the early visual cortex with sparse coding hierarchical VAEs trained on natural images.
We show that representations similar to the one found in the primary and secondary visual cortices naturally emerge under mild inductive biases.
We show that a neuroscience-inspired choice of the recognition model is critical for two signatures of computations with generative models.
arXiv Detail & Related papers (2022-06-01T12:21:58Z) - Prune and distill: similar reformatting of image information along rat
visual cortex and deep neural networks [61.60177890353585]
Deep convolutional neural networks (CNNs) have been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex.
Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex.
We show that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
arXiv Detail & Related papers (2022-05-27T08:06:40Z) - Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans.
Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art.
LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - A Psychophysically Oriented Saliency Map Prediction Model [4.884688557957589]
We propose a new psychophysical saliency prediction architecture, WECSF, inspired by multi-channel model of visual cortex functioning in humans.
The proposed model is evaluated using several datasets, including the MIT1003, MIT300, Toronto, SID4VAM, and UCF Sports datasets.
Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos.
arXiv Detail & Related papers (2020-11-08T20:58:05Z) - A Deep Drift-Diffusion Model for Image Aesthetic Score Distribution
Prediction [68.76594695163386]
We propose a Deep Drift-Diffusion model inspired by psychologists to predict aesthetic score distribution from images.
The DDD model can describe the psychological process of aesthetic perception instead of traditional modeling of the results of assessment.
Our novel DDD model is simple but efficient, which outperforms the state-of-the-art methods in aesthetic score distribution prediction.
arXiv Detail & Related papers (2020-10-15T11:01:46Z) - Self-Supervised Learning of a Biologically-Inspired Visual Texture Model [6.931125029302013]
We develop a model for representing visual texture in a low-dimensional feature space.
Inspired by the architecture of primate visual cortex, the model uses a first stage of oriented linear filters.
We show that the learned model exhibits stronger representational similarity to texture responses of neural populations recorded in primate V2 than pre-trained deep CNNs.
arXiv Detail & Related papers (2020-06-30T17:12:09Z) - Emergent Properties of Foveated Perceptual Systems [3.3504365823045044]
This work is inspired by the foveated human visual system, which has higher acuity at the center of gaze and texture-like encoding in the periphery.
We introduce models consisting of a first-stage textitfixed image transform followed by a second-stage textitlearnable convolutional neural network.
We find that foveation with peripheral texture-based computations yields an efficient, distinct, and robust representational format of scene information.
arXiv Detail & Related papers (2020-06-14T19:34:44Z) - A Neuromorphic Proto-Object Based Dynamic Visual Saliency Model with an
FPGA Implementation [1.2387676601792899]
We present a neuromorphic, bottom-up, dynamic visual saliency model based on the notion of proto-objects.
This model outperforms state-of-the-art dynamic visual saliency models in predicting human eye fixations on a commonly used video dataset.
We introduce a Field-Programmable Gate Array implementation of the model on an Opal Kelly 7350 Kintex-7 board.
arXiv Detail & Related papers (2020-02-27T03:31:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.