LocalEyenet: Deep Attention framework for Localization of Eyes
- URL: http://arxiv.org/abs/2303.12728v1
- Date: Mon, 13 Mar 2023 06:35:45 GMT
- Title: LocalEyenet: Deep Attention framework for Localization of Eyes
- Authors: Somsukla Maiti and Akshansh Gupta
- Abstract summary: We have proposed a deep coarse-to-fine architecture called LocalEyenet for localization of only the eye regions that can be trained end-to-end.
Our model shows good generalization ability in cross-dataset evaluation and in real-time localization of eyes.
- Score: 0.609170287691728
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Development of human machine interface has become a necessity for modern day
machines to catalyze more autonomy and more efficiency. Gaze driven human
intervention is an effective and convenient option for creating an interface to
alleviate human errors. Facial landmark detection is very crucial for designing
a robust gaze detection system. Regression based methods capacitate good
spatial localization of the landmarks corresponding to different parts of the
faces. But there are still scope of improvements which have been addressed by
incorporating attention.
In this paper, we have proposed a deep coarse-to-fine architecture called
LocalEyenet for localization of only the eye regions that can be trained
end-to-end. The model architecture, build on stacked hourglass backbone, learns
the self-attention in feature maps which aids in preserving global as well as
local spatial dependencies in face image. We have incorporated deep layer
aggregation in each hourglass to minimize the loss of attention over the depth
of architecture. Our model shows good generalization ability in cross-dataset
evaluation and in real-time localization of eyes.
Related papers
- Locality Alignment Improves Vision-Language Models [55.275235524659905]
Vision language models (VLMs) have seen growing adoption in recent years, but many still struggle with basic spatial reasoning errors.
We propose a new efficient post-training stage for ViTs called locality alignment.
We show that locality-aligned backbones improve performance across a range of benchmarks.
arXiv Detail & Related papers (2024-10-14T21:01:01Z) - Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - Neural Point-based Volumetric Avatar: Surface-guided Neural Points for
Efficient and Photorealistic Volumetric Head Avatar [62.87222308616711]
We propose fullname (name), a method that adopts the neural point representation and the neural volume rendering process.
Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map.
By design, our name is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars.
arXiv Detail & Related papers (2023-07-11T03:40:10Z) - LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial
Expression Recognition [19.5702895176141]
Previous methods for facial expression recognition (DFER) in the wild are mainly based on Convolutional Neural Networks (CNNs), whose local operations ignore the long-range dependencies in videos.
We propose Transformer-based methods for DFER to achieve better performances but result in higher FLOPs and computational costs.
Experiments on two in-the-wild dynamic facial expression datasets (i.e., DFEW and FERV39K) indicate that our method provides an effective way to make use of the spatial and temporal dependencies for DFER.
arXiv Detail & Related papers (2023-05-05T07:53:13Z) - ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient
Self-Supervised Monocular Depth Estimation [6.923035780685481]
We propose an efficient local adaptive attention method for geometric aware representation enhancement.
We leverage geometric cues from semantic information to learn local adaptive bounding boxes to guide unsupervised feature aggregation.
Our proposed method establishes a new state-of-the-art in self-supervised monocular depth estimation task.
arXiv Detail & Related papers (2022-12-12T06:38:35Z) - Real-time Local Feature with Global Visual Information Enhancement [6.640269424085467]
Current deep learning-based local feature algorithms always utilize convolution neural network (CNN) architecture with limited receptive field.
The proposed method introduces a global enhancement module to fuse global visual clues in a light-weight network.
Experiments on the public benchmarks demonstrate that the proposal can achieve considerable robustness against visual interference and meanwhile run in real time.
arXiv Detail & Related papers (2022-11-20T13:44:20Z) - Centralized Feature Pyramid for Object Detection [53.501796194901964]
Visual feature pyramid has shown its superiority in both effectiveness and efficiency in a wide range of applications.
In this paper, we propose a OLO Feature Pyramid for object detection, which is based on a globally explicit centralized feature regulation.
arXiv Detail & Related papers (2022-10-05T08:32:54Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - Active Visual Localization in Partially Calibrated Environments [35.48595012305253]
Humans can robustly localize themselves without a map after they get lost following prominent visual cues or landmarks.
In this work, we aim at endowing autonomous agents the same ability. Such ability is important in robotics applications yet very challenging when an agent is exposed to partially calibrated environments.
We propose an indoor scene dataset ACR-6, which consists of both synthetic and real data and simulates challenging scenarios for active visual localization.
arXiv Detail & Related papers (2020-12-08T08:00:55Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - On estimating gaze by self-attention augmented convolutions [6.015556590955813]
We propose a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features.
We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones.
Results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, and a second-place on the EyeDiap data set.
arXiv Detail & Related papers (2020-08-25T14:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.