On estimating gaze by self-attention augmented convolutions
- URL: http://arxiv.org/abs/2008.11055v2
- Date: Tue, 3 Nov 2020 13:49:19 GMT
- Title: On estimating gaze by self-attention augmented convolutions
- Authors: Gabriel Lefundes, Luciano Oliveira
- Abstract summary: We propose a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features.
We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones.
Results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, and a second-place on the EyeDiap data set.
- Score: 6.015556590955813
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimation of 3D gaze is highly relevant to multiple fields, including but
not limited to interactive systems, specialized human-computer interfaces, and
behavioral research. Although recently deep learning methods have boosted the
accuracy of appearance-based gaze estimation, there is still room for
improvement in the network architectures for this particular task. Therefore we
propose here a novel network architecture grounded on self-attention augmented
convolutions to improve the quality of the learned features during the training
of a shallower residual network. The rationale is that self-attention mechanism
can help outperform deeper architectures by learning dependencies between
distant regions in full-face images. This mechanism can also create better and
more spatially-aware feature representations derived from the face and eye
images before gaze regression. We dubbed our framework ARes-gaze, which
explores our Attention-augmented ResNet (ARes-14) as twin convolutional
backbones. In our experiments, results showed a decrease of the average angular
error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze
data set, and a second-place on the EyeDiap data set. It is noteworthy that our
proposed framework was the only one to reach high accuracy simultaneously on
both data sets.
Related papers
- Investigation of Architectures and Receptive Fields for Appearance-based
Gaze Estimation [29.154335016375367]
We show that tuning a few simple parameters of a ResNet architecture can outperform most of the existing state-of-the-art methods for the gaze estimation task.
We obtain the state-of-the-art performances on three datasets with 3.64 on ETH-XGaze, 4.50 on MPIIFaceGaze, and 9.13 on Gaze360 degrees gaze estimation error.
arXiv Detail & Related papers (2023-08-18T14:41:51Z) - Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative
Convolution Network [80.19054069988559]
We find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency.
We propose a new Direction-aware Cumulative Convolution Network (DaCCN), which improves the depth representation in two aspects.
Experiments show that our method achieves significant improvements on three widely used benchmarks.
arXiv Detail & Related papers (2023-08-10T14:32:18Z) - LocalEyenet: Deep Attention framework for Localization of Eyes [0.609170287691728]
We have proposed a deep coarse-to-fine architecture called LocalEyenet for localization of only the eye regions that can be trained end-to-end.
Our model shows good generalization ability in cross-dataset evaluation and in real-time localization of eyes.
arXiv Detail & Related papers (2023-03-13T06:35:45Z) - Explicitly incorporating spatial information to recurrent networks for
agriculture [4.583080280213959]
We propose novel approaches to improve the classification of deep convolutional neural networks.
We leverage available RGB-D images and robot odometry to perform inter-frame feature map spatial registration.
This information is then fused within recurrent deep learnt models, to improve their accuracy and robustness.
arXiv Detail & Related papers (2022-06-27T15:57:42Z) - L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments [2.5234156040689237]
We propose a robust CNN-based model for predicting gaze in unconstrained settings.
We use two identical losses, one for each angle, to improve network learning and increase its generalization.
Our proposed model achieves state-of-the-art accuracy of 3.92deg and 10.41deg on MPIIGaze and Gaze360 datasets, respectively.
arXiv Detail & Related papers (2022-03-07T12:35:39Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z) - Learning Robust Feature Representations for Scene Text Detection [0.0]
We present a network architecture derived from the loss to maximize conditional log-likelihood.
By extending the layer of latent variables to multiple layers, the network is able to learn robust features on scale.
In experiments, the proposed algorithm significantly outperforms state-of-the-art methods in terms of both recall and precision.
arXiv Detail & Related papers (2020-05-26T01:06:47Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.