CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild
        - URL: http://arxiv.org/abs/2402.08316v1
- Date: Tue, 13 Feb 2024 09:20:26 GMT
- Title: CrossGaze: A Strong Method for 3D Gaze Estimation in the Wild
- Authors: Andy C\u{a}trun\u{a}, Adrian Cosma, Emilian R\u{a}doi
- Abstract summary: We propose CrossGaze, a strong baseline for gaze estimation.
Our model surpasses several state-of-the-art methods, achieving a mean angular error of 9.94 degrees.
Our proposed model serves as a strong foundation for future research and development in gaze estimation.
- Score: 4.089889918897877
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract:   Gaze estimation, the task of predicting where an individual is looking, is a
critical task with direct applications in areas such as human-computer
interaction and virtual reality. Estimating the direction of looking in
unconstrained environments is difficult, due to the many factors that can
obscure the face and eye regions. In this work we propose CrossGaze, a strong
baseline for gaze estimation, that leverages recent developments in computer
vision architectures and attention-based modules. Unlike previous approaches,
our method does not require a specialised architecture, utilizing already
established models that we integrate in our architecture and adapt for the task
of 3D gaze estimation. This approach allows for seamless updates to the
architecture as any module can be replaced with more powerful feature
extractors. On the Gaze360 benchmark, our model surpasses several
state-of-the-art methods, achieving a mean angular error of 9.94 degrees. Our
proposed model serves as a strong foundation for future research and
development in gaze estimation, paving the way for practical and accurate gaze
prediction in real-world scenarios.
 
      
        Related papers
        - Towards Depth Foundation Model: Recent Trends in Vision-Based Depth   Estimation [75.30238170051291]
 Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
 arXiv  Detail & Related papers  (2025-07-15T17:59:59Z)
- MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient   Calibration Module [5.559268969773661]
 MAGE is a Multi-task Architecture for Gaze Estimation with an efficient calibration module.<n>Our basic model encodes both the directional and positional features from facial images.<n>Our method achieves state-of-the-art performance on the public MPIIFaceGaze, EYEDIAP, and our built IMRGaze datasets.
 arXiv  Detail & Related papers  (2025-05-22T08:36:58Z)
- AI in a vat: Fundamental limits of efficient world modelling for agent   sandboxing and interpretability [84.52205243353761]
 Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.
We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
 arXiv  Detail & Related papers  (2025-04-06T20:35:44Z)
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation   from a Single Image [94.56927147492738]
 We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images.
We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage.
We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
 arXiv  Detail & Related papers  (2024-03-18T17:50:41Z)
- A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
 We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL)
The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics.
Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
 arXiv  Detail & Related papers  (2023-09-15T17:37:09Z)
- Investigation of Architectures and Receptive Fields for Appearance-based
  Gaze Estimation [29.154335016375367]
 We show that tuning a few simple parameters of a ResNet architecture can outperform most of the existing state-of-the-art methods for the gaze estimation task.
We obtain the state-of-the-art performances on three datasets with 3.64 on ETH-XGaze, 4.50 on MPIIFaceGaze, and 9.13 on Gaze360 degrees gaze estimation error.
 arXiv  Detail & Related papers  (2023-08-18T14:41:51Z)
- GEO-Bench: Toward Foundation Models for Earth Monitoring [139.77907168809085]
 We propose a benchmark comprised of six classification and six segmentation tasks.
This benchmark will be a driver of progress across a variety of Earth monitoring tasks.
 arXiv  Detail & Related papers  (2023-06-06T16:16:05Z)
- NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation [37.977032771941715]
 We propose a novel Head-Eye redirection parametric model based on Neural Radiance Field.
Our model can decouple the face and eyes for separate neural rendering.
It can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction.
 arXiv  Detail & Related papers  (2022-12-30T13:52:28Z)
- 3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from
  Synthetic Views [67.00931529296788]
 We propose to train general gaze estimation models which can be directly employed in novel environments without adaptation.
We create a large-scale dataset of diverse faces with gaze pseudo-annotations, which we extract based on the 3D geometry of the scene.
We test our method in the task of gaze generalization, in which we demonstrate improvement of up to 30% compared to state-of-the-art when no ground truth data are available.
 arXiv  Detail & Related papers  (2022-12-06T14:15:17Z)
- Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
 3D bounding boxes are a widespread intermediate representation in many computer vision applications.
We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures.
We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
 arXiv  Detail & Related papers  (2022-10-13T23:57:40Z)
- Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction [31.02081143697431]
 Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and video-surveillance applications.
We propose a lightweight attention-based recurrent backbone that acts solely on past observed positions.
We employ a common goal module, based on a U-Net architecture, which additionally extracts semantic information to predict scene-compliant destinations.
 arXiv  Detail & Related papers  (2022-04-25T11:12:37Z)
- Learning-by-Novel-View-Synthesis for Full-Face Appearance-based 3D Gaze
  Estimation [8.929311633814411]
 This work examines a novel approach for synthesizing gaze estimation training data based on monocular 3D face reconstruction.
Unlike prior works using multi-view reconstruction, photo-realistic CG models, or generative neural networks, our approach can manipulate and extend the head pose range of existing training data.
 arXiv  Detail & Related papers  (2022-01-20T00:29:45Z)
- Toward Foundation Models for Earth Monitoring: Proposal for a Climate
  Change Benchmark [95.19070157520633]
 Recent progress in self-supervision shows that pre-training large neural networks on vast amounts of unsupervised data can lead to impressive increases in generalisation for downstream tasks.
Such models, recently coined as foundation models, have been transformational to the field of natural language processing.
We propose to develop a new benchmark comprised of a variety of downstream tasks related to climate change.
 arXiv  Detail & Related papers  (2021-12-01T15:38:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.