CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild
- URL: http://arxiv.org/abs/2402.08316v1
- Date: Tue, 13 Feb 2024 09:20:26 GMT
- Title: CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild
- Authors: Andy C\u{a}trun\u{a}, Adrian Cosma, Emilian R\u{a}doi
- Abstract summary: We propose CrossGaze, a strong baseline for gaze estimation.
Our model surpasses several state-of-the-art methods, achieving a mean angular error of 9.94 degrees.
Our proposed model serves as a strong foundation for future research and development in gaze estimation.
- Score: 4.089889918897877
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Gaze estimation, the task of predicting where an individual is looking, is a
critical task with direct applications in areas such as human-computer
interaction and virtual reality. Estimating the direction of looking in
unconstrained environments is difficult, due to the many factors that can
obscure the face and eye regions. In this work we propose CrossGaze, a strong
baseline for gaze estimation, that leverages recent developments in computer
vision architectures and attention-based modules. Unlike previous approaches,
our method does not require a specialised architecture, utilizing already
established models that we integrate in our architecture and adapt for the task
of 3D gaze estimation. This approach allows for seamless updates to the
architecture as any module can be replaced with more powerful feature
extractors. On the Gaze360 benchmark, our model surpasses several
state-of-the-art methods, achieving a mean angular error of 9.94 degrees. Our
proposed model serves as a strong foundation for future research and
development in gaze estimation, paving the way for practical and accurate gaze
prediction in real-world scenarios.
Related papers
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image [94.56927147492738]
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images.
We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage.
We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
arXiv Detail & Related papers (2024-03-18T17:50:41Z) - A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL)
The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics.
Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
arXiv Detail & Related papers (2023-09-15T17:37:09Z) - Investigation of Architectures and Receptive Fields for Appearance-based
Gaze Estimation [29.154335016375367]
We show that tuning a few simple parameters of a ResNet architecture can outperform most of the existing state-of-the-art methods for the gaze estimation task.
We obtain the state-of-the-art performances on three datasets with 3.64 on ETH-XGaze, 4.50 on MPIIFaceGaze, and 9.13 on Gaze360 degrees gaze estimation error.
arXiv Detail & Related papers (2023-08-18T14:41:51Z) - GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
We propose a benchmark comprised of six classification and six segmentation tasks.
This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
arXiv Detail & Related papers (2023-06-06T16:16:05Z) - NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation [37.977032771941715]
We propose a novel Head-Eye redirection parametric model based on Neural Radiance Field.
Our model can decouple the face and eyes for separate neural rendering.
It can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction.
arXiv Detail & Related papers (2022-12-30T13:52:28Z) - 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
Synthetic Views [67.00931529296788]
We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
arXiv Detail & Related papers (2022-12-06T14:15:17Z) - Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
3D bounding boxes are a widespread intermediate representation in many computer vision applications.
We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures.
We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
arXiv Detail & Related papers (2022-10-13T23:57:40Z) - Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction [31.02081143697431]
Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and video-surveillance applications.
We propose a lightweight attention-based recurrent backbone that acts solely on past observed positions.
We employ a common goal module, based on a U-Net architecture, which additionally extracts semantic information to predict scene-compliant destinations.
arXiv Detail & Related papers (2022-04-25T11:12:37Z) - Learning-by-Novel-View-Synthesis for Full-Face Appearance-based 3D Gaze
Estimation [8.929311633814411]
This work examines a novel approach for synthesizing gaze estimation training data based on monocular 3D face reconstruction.
Unlike prior works using multi-view reconstruction, photo-realistic CG models, or generative neural networks, our approach can manipulate and extend the head pose range of existing training data.
arXiv Detail & Related papers (2022-01-20T00:29:45Z) - Toward Foundation Models for Earth Monitoring: Proposal for a Climate
Change Benchmark [95.19070157520633]
Recent progress in self-supervision shows that pre-training large neural networks on vast amounts of unsupervised data can lead to impressive increases in generalisation for downstream tasks.
Such models, recently coined as foundation models, have been transformational to the field of natural language processing.
We propose to develop a new benchmark comprised of a variety of downstream tasks related to climate change.
arXiv Detail & Related papers (2021-12-01T15:38:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.