Emergent Properties of Foveated Perceptual Systems
- URL: http://arxiv.org/abs/2006.07991v3
- Date: Tue, 22 Jun 2021 21:21:08 GMT
- Title: Emergent Properties of Foveated Perceptual Systems
- Authors: Arturo Deza and Talia Konkle
- Abstract summary: This work is inspired by the foveated human visual system, which has higher acuity at the center of gaze and texture-like encoding in the periphery.
We introduce models consisting of a first-stage textitfixed image transform followed by a second-stage textitlearnable convolutional neural network.
We find that foveation with peripheral texture-based computations yields an efficient, distinct, and robust representational format of scene information.
- Score: 3.3504365823045044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of this work is to characterize the representational impact that
foveation operations have for machine vision systems, inspired by the foveated
human visual system, which has higher acuity at the center of gaze and
texture-like encoding in the periphery. To do so, we introduce models
consisting of a first-stage \textit{fixed} image transform followed by a
second-stage \textit{learnable} convolutional neural network, and we varied the
first stage component. The primary model has a foveated-textural input stage,
which we compare to a model with foveated-blurred input and a model with
spatially-uniform blurred input (both matched for perceptual compression), and
a final reference model with minimal input-based compression. We find that: 1)
the foveated-texture model shows similar scene classification accuracy as the
reference model despite its compressed input, with greater i.i.d.
generalization than the other models; 2) the foveated-texture model has greater
sensitivity to high-spatial frequency information and greater robustness to
occlusion, w.r.t the comparison models; 3) both the foveated systems, show a
stronger center image-bias relative to the spatially-uniform systems even with
a weight sharing constraint. Critically, these results are preserved over
different classical CNN architectures throughout their learning dynamics.
Altogether, this suggests that foveation with peripheral texture-based
computations yields an efficient, distinct, and robust representational format
of scene information, and provides symbiotic computational insight into the
representational consequences that texture-based peripheral encoding may have
for processing in the human visual system, while also potentially inspiring the
next generation of computer vision models via spatially-adaptive computation.
Code + Data available here: https://github.com/ArturoDeza/EmergentProperties
Related papers
- pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System [0.716879432974126]
We introduce a deep convolutional model that closely approximates human visual information processing.
We aim to approximate the function for the lateral geniculate nucleus (LGN) area using a trained shallow convolutional model.
The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
arXiv Detail & Related papers (2024-09-20T16:33:01Z) - Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and
Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture.
The proposed model is trained separately to map text embeddings to image embeddings of CLIP.
We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - Top-down inference in an early visual cortex inspired hierarchical
Variational Autoencoder [0.0]
We exploit advances in Variational Autoencoders to investigate the early visual cortex with sparse coding hierarchical VAEs trained on natural images.
We show that representations similar to the one found in the primary and secondary visual cortices naturally emerge under mild inductive biases.
We show that a neuroscience-inspired choice of the recognition model is critical for two signatures of computations with generative models.
arXiv Detail & Related papers (2022-06-01T12:21:58Z) - FoveaTer: Foveated Transformer for Image Classification [8.207403859762044]
We propose foveated Transformer (FoveaTer) model, which uses pooling regions and saccadic movements to perform object classification tasks.
We construct an ensemble model using our proposed model and unfoveated model, achieving an accuracy 1.36% below the unfoveated model with 22% computational savings.
arXiv Detail & Related papers (2021-05-29T01:54:33Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z) - A Psychophysically Oriented Saliency Map Prediction Model [4.884688557957589]
We propose a new psychophysical saliency prediction architecture, WECSF, inspired by multi-channel model of visual cortex functioning in humans.
The proposed model is evaluated using several datasets, including the MIT1003, MIT300, Toronto, SID4VAM, and UCF Sports datasets.
Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos.
arXiv Detail & Related papers (2020-11-08T20:58:05Z) - Self-Supervised Learning of a Biologically-Inspired Visual Texture Model [6.931125029302013]
We develop a model for representing visual texture in a low-dimensional feature space.
Inspired by the architecture of primate visual cortex, the model uses a first stage of oriented linear filters.
We show that the learned model exhibits stronger representational similarity to texture responses of neural populations recorded in primate V2 than pre-trained deep CNNs.
arXiv Detail & Related papers (2020-06-30T17:12:09Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z) - Neural Human Video Rendering by Learning Dynamic Textures and
Rendering-to-Video Translation [99.64565200170897]
We propose a novel human video synthesis method by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space.
We show several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-01-14T18:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.