TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and
Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector,
and Eye Movement Types
- URL: http://arxiv.org/abs/2102.02115v3
- Date: Tue, 6 Jun 2023 08:47:49 GMT
- Title: TEyeD: Over 20 million real-world eye images with Pupil, Eyelid, and
Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector,
and Eye Movement Types
- Authors: Wolfgang Fuhl and Gjergji Kasneci and Enkelejda Kasneci
- Abstract summary: TEyeD is the world's largest unified public data set of eye images taken with head-mounted devices.
The data set includes 2D and 3D landmarks, semantic segmentation, 3D eyeball annotation and the gaze vector and eye movement types for all images.
- Score: 18.53571873938032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present TEyeD, the world's largest unified public data set of eye images
taken with head-mounted devices. TEyeD was acquired with seven different
head-mounted eye trackers. Among them, two eye trackers were integrated into
virtual reality (VR) or augmented reality (AR) devices. The images in TEyeD
were obtained from various tasks, including car rides, simulator rides, outdoor
sports activities, and daily indoor activities. The data set includes 2D and 3D
landmarks, semantic segmentation, 3D eyeball annotation and the gaze vector and
eye movement types for all images. Landmarks and semantic segmentation are
provided for the pupil, iris and eyelids. Video lengths vary from a few minutes
to several hours. With more than 20 million carefully annotated images, TEyeD
provides a unique, coherent resource and a valuable foundation for advancing
research in the field of computer vision, eye tracking and gaze estimation in
modern VR and AR applications.
Download: Just connect via FTP as user TEyeDUser and without password to
nephrit.cs.uni-tuebingen.de (ftp://nephrit.cs.uni-tuebingen.de).
Related papers
- Gaze Beyond the Frame: Forecasting Egocentric 3D Visual Span [50.60002620855774]
We propose EgoSpanLift, a method that transforms egocentric visual span forecasting from 2D image planes to 3D scenes.<n>We also curate a benchmark from raw egocentric data, creating a testbed with 364.6K samples for 3D visual span forecasting.
arXiv Detail & Related papers (2025-11-23T14:37:11Z) - 3D Aware Region Prompted Vision Language Model [99.4106711584584]
SR-3D connects single-view 2D images and multi-view 3D data through a shared visual token space.<n> SR-3D supports flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D.
arXiv Detail & Related papers (2025-09-16T17:59:06Z) - Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness [73.72335146374543]
We introduce reconstructive visual instruction tuning with 3D-awareness (Ross3D), which integrates 3D-aware visual supervision into the training procedure.
Ross3D achieves state-of-the-art performance across various 3D scene understanding benchmarks.
arXiv Detail & Related papers (2025-04-02T16:59:55Z) - HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos [9.513100627302755]
The dataset offers over 833 minutes (3.7M+ images) of recordings that feature 19 subjects interacting with 33 diverse rigid objects.
The recordings include multiple synchronized data streams containing egocentric multi-view RGB/monochrome images, eye gaze signal, scene point clouds, and 3D poses of cameras, hands, and objects.
In our experiments, we demonstrate the effectiveness of multi-view egocentric data for three popular tasks: 3D hand tracking, model-based 6DoF object pose estimation, and 3D lifting of unknown in-hand objects.
arXiv Detail & Related papers (2024-11-28T14:09:42Z) - MMHead: Towards Fine-grained Multi-modal 3D Facial Animation [68.04052669266174]
We construct a large-scale multi-modal 3D facial animation dataset, MMHead.
MMHead consists of 49 hours of 3D facial motion sequences, speech audios, and rich hierarchical text annotations.
Based on the MMHead dataset, we establish benchmarks for two new tasks: text-induced 3D talking head animation and text-to-3D facial motion generation.
arXiv Detail & Related papers (2024-10-10T09:37:01Z) - Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D [95.14469865815768]
2D vision models can be used for semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets.
However, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task.
In this paper, we propose Lift3D, which trains to predict unseen views on feature spaces generated by a few visual models.
We even outperform state-of-the-art methods specialized for the task in question.
arXiv Detail & Related papers (2024-03-27T18:13:16Z) - Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine
Perception [5.952224408665015]
Aria Digital Twin (ADT) is an egocentric dataset captured using Aria glasses.
ADT contains 200 sequences of real-world activities conducted by Aria wearers in two real indoor scenes.
arXiv Detail & Related papers (2023-06-10T06:46:32Z) - RenderMe-360: A Large Digital Asset Library and Benchmarks Towards
High-fidelity Head Avatars [157.82758221794452]
We present RenderMe-360, a comprehensive 4D human head dataset to drive advance in head avatar research.
It contains massive data assets, with 243+ million complete head frames, and over 800k video sequences from 500 different identities.
Based on the dataset, we build a comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks.
arXiv Detail & Related papers (2023-05-22T17:54:01Z) - DRaCoN -- Differentiable Rasterization Conditioned Neural Radiance
Fields for Articulated Avatars [92.37436369781692]
We present DRaCoN, a framework for learning full-body volumetric avatars.
It exploits the advantages of both the 2D and 3D neural rendering techniques.
Experiments on the challenging ZJU-MoCap and Human3.6M datasets indicate that DRaCoN outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T17:59:15Z) - Pistol: Pupil Invisible Supportive Tool to extract Pupil, Iris, Eye
Opening, Eye Movements, Pupil and Iris Gaze Vector, and 2D as well as 3D Gaze [12.314175125417098]
In offline mode, our software extracts multiple features from the eye including, the pupil and iris ellipse, eye aperture, pupil vector, iris vector, eye movement types from pupil and iris velocities, marker detection, marker distance, 2D gaze estimation for the pupil center, iris center, pupil vector, and iris vector.
The gaze signal is computed in 2D for each eye and each feature separately and for both eyes in 3D also for each feature separately.
arXiv Detail & Related papers (2022-01-18T07:54:55Z) - Recognizing Scenes from Novel Viewpoints [99.90914180489456]
Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects.
We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories.
arXiv Detail & Related papers (2021-12-02T18:59:40Z) - KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding
in 2D and 3D [67.50776195828242]
KITTI-360 is a suburban driving dataset which comprises richer input modalities, comprehensive semantic instance annotations and accurate localization.
For efficient annotation, we created a tool to label 3D scenes with bounding primitives, resulting in over 150k semantic and instance annotated images and 1B annotated 3D points.
We established benchmarks and baselines for several tasks relevant to mobile perception, encompassing problems from computer vision, graphics, and robotics on the same dataset.
arXiv Detail & Related papers (2021-09-28T00:41:29Z) - Disentangling and Vectorization: A 3D Visual Perception Approach for
Autonomous Driving Based on Surround-View Fisheye Cameras [3.485767750936058]
Multidimensional Vector is proposed to include the utilizable information generated in different dimensions and stages.
The experiments of real fisheye images demonstrate that our solution achieves state-of-the-art accuracy while being real-time in practice.
arXiv Detail & Related papers (2021-07-19T13:24:21Z) - MagicEyes: A Large Scale Eye Gaze Estimation Dataset for Mixed Reality [8.025086113117291]
We present MagicEyes, the first large scale eye dataset collected using real MR devices with comprehensive ground truth labeling.
We evaluate several state-of-the-art methods on MagicEyes and also propose a new multi-task EyeNet model designed for detecting the cornea, glints and pupil along with eye segmentation in a single forward pass.
arXiv Detail & Related papers (2020-03-18T08:23:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.