Foveation in the Era of Deep Learning
- URL: http://arxiv.org/abs/2312.01450v1
- Date: Sun, 3 Dec 2023 16:48:09 GMT
- Title: Foveation in the Era of Deep Learning
- Authors: George Killick, Paul Henderson, Paul Siebert and Gerardo
Aragon-Camarasa
- Abstract summary: We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images.
Our model learns to iteratively attend to regions of the image relevant for classification.
We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
- Score: 6.602118206533142
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we tackle the challenge of actively attending to visual scenes
using a foveated sensor. We introduce an end-to-end differentiable foveated
active vision architecture that leverages a graph convolutional network to
process foveated images, and a simple yet effective formulation for foveated
image sampling. Our model learns to iteratively attend to regions of the image
relevant for classification. We conduct detailed experiments on a variety of
image datasets, comparing the performance of our method with previous
approaches to foveated vision while measuring how the impact of different
choices, such as the degree of foveation, and the number of fixations the
network performs, affect object recognition performance. We find that our model
outperforms a state-of-the-art CNN and foveated vision architectures of
comparable parameters and a given pixel or computation budget
Related papers
- Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Fiducial Focus Augmentation for Facial Landmark Detection [4.433764381081446]
We propose a novel image augmentation technique to enhance the model's understanding of facial structures.
We employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss.
Our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
arXiv Detail & Related papers (2024-02-23T01:34:00Z) - Saliency-based Video Summarization for Face Anti-spoofing [4.730428911461921]
We present a video summarization method for face anti-spoofing detection that aims to enhance the performance of deep learning models by leveraging visual saliency.
In particular, saliency information is extracted from the differences between the Laplacian and Wiener filter outputs of the source images.
Weighting maps are then computed based on the saliency information, indicating the importance of each pixel in the image.
arXiv Detail & Related papers (2023-08-23T18:08:32Z) - Learning to search for and detect objects in foveal images using deep
learning [3.655021726150368]
This study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image.
The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene.
We present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks.
arXiv Detail & Related papers (2023-04-12T09:50:25Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - Enhancing Photorealism Enhancement [83.88433283714461]
We present an approach to enhancing the realism of synthetic images using a convolutional network.
We analyze scene layout distributions in commonly used datasets and find that they differ in important ways.
We report substantial gains in stability and realism in comparison to recent image-to-image translation methods.
arXiv Detail & Related papers (2021-05-10T19:00:49Z) - Multimodal Contrastive Training for Visual Representation Learning [45.94662252627284]
We develop an approach to learning visual representations that embraces multimodal data.
Our method exploits intrinsic data properties within each modality and semantic information from cross-modal correlation simultaneously.
By including multimodal training in a unified framework, our method can learn more powerful and generic visual features.
arXiv Detail & Related papers (2021-04-26T19:23:36Z) - Optical Flow Estimation from a Single Motion-blurred Image [66.2061278123057]
Motion blur in an image may have practical interests in fundamental computer vision problems.
We propose a novel framework to estimate optical flow from a single motion-blurred image in an end-to-end manner.
arXiv Detail & Related papers (2021-03-04T12:45:18Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.