Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
- URL: http://arxiv.org/abs/2311.12157v1
- Date: Mon, 20 Nov 2023 20:22:55 GMT
- Title: Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
- Authors: Nikola Popovic, Dimitrios Christodoulou, Danda Pani Paudel, Xi Wang,
Luc Van Gool
- Abstract summary: We propose to predict 3D eye gaze from weak supervision of eye semantic segmentation masks and direct supervision of a few 3D gaze vectors.
Our experiments in diverse settings illustrate the significant benefits of the proposed method, achieving about 5 degrees lower angular gaze error over the baseline.
- Score: 60.360919642038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of predicting 3D eye gaze from eye images can be performed either by
(a) end-to-end learning for image-to-gaze mapping or by (b) fitting a 3D eye
model onto images. The former case requires 3D gaze labels, while the latter
requires eye semantics or landmarks to facilitate the model fitting. Although
obtaining eye semantics and landmarks is relatively easy, fitting an accurate
3D eye model on them remains to be very challenging due to its ill-posed nature
in general. On the other hand, obtaining large-scale 3D gaze data is cumbersome
due to the required hardware setups and computational demands. In this work, we
propose to predict 3D eye gaze from weak supervision of eye semantic
segmentation masks and direct supervision of a few 3D gaze vectors. The
proposed method combines the best of both worlds by leveraging large amounts of
weak annotations--which are easy to obtain, and only a few 3D gaze
vectors--which alleviate the difficulty of fitting 3D eye models on the
semantic segmentation of eye images. Thus, the eye gaze vectors, used in the
model fitting, are directly supervised using the few-shot gaze labels.
Additionally, we propose a transformer-based network architecture, that serves
as a solid baseline for our improvements. Our experiments in diverse settings
illustrate the significant benefits of the proposed method, achieving about 5
degrees lower angular gaze error over the baseline, when only 0.05% 3D
annotations of the training images are used. The source code is available at
https://github.com/dimitris-christodoulou57/Model-aware_3D_Eye_Gaze.
Related papers
- Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling [9.440800948514449]
We propose a weakly-supervised 3D scene graph generation method via Visual-Linguistic Assisted Pseudo-labeling.
Our 3D-VLAP exploits the superior ability of current large-scale visual-linguistic models to align the semantics between texts and 2D images.
We design an edge self-attention based graph neural network to generate scene graphs of 3D point cloud scenes.
arXiv Detail & Related papers (2024-04-03T07:30:09Z) - Weakly Supervised Monocular 3D Detection with a Single-View Image [58.57978772009438]
Monocular 3D detection aims for precise 3D object localization from a single-view image.
We propose SKD-WM3D, a weakly supervised monocular 3D detection framework.
We show that SKD-WM3D surpasses the state-of-the-art clearly and is even on par with many fully supervised methods.
arXiv Detail & Related papers (2024-02-29T13:26:47Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance
Fields using Geometry-Guided Text-to-Image Diffusion Model [39.64952340472541]
We propose a controllable text-to-3D avatar generation method whose facial expression is controllable.
Our main strategy is to construct the 3D avatar in Neural Radiance Fields (NeRF) optimized with a set of controlled viewpoint-aware images.
We demonstrate the empirical results and discuss the effectiveness of our method.
arXiv Detail & Related papers (2023-09-07T08:14:46Z) - Accurate Gaze Estimation using an Active-gaze Morphable Model [9.192482716410511]
Rather than regressing gaze direction directly from images, we show that adding a 3D shape model can improve gaze estimation accuracy.
We equip this with a geometric vergence model of gaze to give an active-gaze 3DMM'
Our method can learn with only the ground truth gaze target point and the camera parameters, without access to the ground truth gaze origin points.
arXiv Detail & Related papers (2023-01-30T18:51:14Z) - Weakly Supervised Volumetric Image Segmentation with Deformed Templates [80.04326168716493]
We propose an approach that is truly weakly-supervised in the sense that we only need to provide a sparse set of 3D point on the surface of target objects.
We will show that it outperforms a more traditional approach to weak-supervision in 3D at a reduced supervision cost.
arXiv Detail & Related papers (2021-06-07T22:09:34Z) - Image GANs meet Differentiable Rendering for Inverse Graphics and
Interpretable 3D Neural Rendering [101.56891506498755]
Differentiable rendering has paved the way to training neural networks to perform "inverse graphics" tasks.
We show that our approach significantly outperforms state-of-the-art inverse graphics networks trained on existing datasets.
arXiv Detail & Related papers (2020-10-18T22:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.