Related papers: Egocentric Gaze Estimation via Neck-Mounted Camera

Egocentric Gaze Estimation via Neck-Mounted Camera

URL: http://arxiv.org/abs/2602.11669v1
Date: Thu, 12 Feb 2026 07:41:27 GMT
Title: Egocentric Gaze Estimation via Neck-Mounted Camera
Authors: Haoyu Huang, Yoichi Sato,
Abstract summary: This paper introduces neck-mounted view gaze estimation, a new task that estimates user gaze from the neck-mounted camera perspective.<n>We collect the first dataset for this task, consisting of approximately 4 hours of video collected from 8 participants during everyday activities.<n>We propose two extensions: an auxiliary gaze out-of-bound classification task and a multi-view co-learning approach that jointly trains head-view and neck-view models.
Score: 27.513961366278455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces neck-mounted view gaze estimation, a new task that estimates user gaze from the neck-mounted camera perspective. Prior work on egocentric gaze estimation, which predicts device wearer's gaze location within the camera's field of view, mainly focuses on head-mounted cameras while alternative viewpoints remain underexplored. To bridge this gap, we collect the first dataset for this task, consisting of approximately 4 hours of video collected from 8 participants during everyday activities. We evaluate a transformer-based gaze estimation model, GLC, on the new dataset and propose two extensions: an auxiliary gaze out-of-bound classification task and a multi-view co-learning approach that jointly trains head-view and neck-view models using a geometry-aware auxiliary loss. Experimental results show that incorporating gaze out-of-bound classification improves performance over standard fine-tuning, while the co-learning approach does not yield gains. We further analyze these results and discuss implications for neck-mounted gaze estimation.

Related papers

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation [18.155092199205907]
In this paper, we present three novel elements to advance in-vehicle gaze research. First, we introduce IVGaze, a pioneering dataset capturing in-vehicle gaze. Second, our research focuses on in-vehicle gaze estimation leveraging the IVGaze. Third, we explore a novel strategy for gaze zone classification by extending the GazeDPTR.
arXiv Detail & Related papers (2024-03-23T01:22:15Z)
UVAGaze: Unsupervised 1-to-2 Views Adaptation for Gaze Estimation [10.412375913640224]
We propose a novel 1-view-to-2-views (1-to-2 views) adaptation solution for gaze estimation. Our method adapts a traditional single-view gaze estimator for flexibly placed dual cameras. Experiments show that a single-view estimator, when adapted for dual views, can achieve much higher accuracy, especially in cross-dataset settings.
arXiv Detail & Related papers (2023-12-25T08:13:28Z)
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision [101.36648828734646]
We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.
arXiv Detail & Related papers (2022-11-18T18:59:48Z)
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view. Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z)
Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene. The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z)
Online Deep Clustering with Video Track Consistency [85.8868194550978]
We propose an unsupervised clustering-based approach to learn visual features from video object tracks. We show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.
arXiv Detail & Related papers (2022-06-07T08:11:00Z)
PANet: Perspective-Aware Network with Dynamic Receptive Fields and Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem. Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework. The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z)
Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues. We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z)
LNSMM: Eye Gaze Estimation With Local Network Share Multiview Multitask [7.065909514483728]
We propose a novel methodology to estimate eye gaze points and eye gaze directions simultaneously. The experiment show our method is state-of-the-art the current mainstream methods on two indicators of gaze points and gaze directions.
arXiv Detail & Related papers (2021-01-18T15:14:24Z)
In the Eye of the Beholder: Gaze and Actions in First Person Video [30.54510882243602]
We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera. Our dataset comes with videos, gaze tracking data, hand masks and action annotations. We propose a novel deep model for joint gaze estimation and action recognition in First Person Vision.
arXiv Detail & Related papers (2020-05-31T22:06:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.