Learning Video-independent Eye Contact Segmentation from In-the-Wild
Videos
- URL: http://arxiv.org/abs/2210.02033v1
- Date: Wed, 5 Oct 2022 05:46:40 GMT
- Title: Learning Video-independent Eye Contact Segmentation from In-the-Wild
Videos
- Authors: Tianyi Wu and Yusuke Sugano
- Abstract summary: In this work, we address the task of one-way eye contact detection for videos in the wild.
Our goal is to build a unified model that can identify when a person is looking at his gaze targets in an arbitrary input video.
Due to the scarcity of labeled training data, we propose a gaze target discovery method to generate pseudo-labels for unlabeled videos.
- Score: 18.373736201140026
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human eye contact is a form of non-verbal communication and can have a great
influence on social behavior. Since the location and size of the eye contact
targets vary across different videos, learning a generic video-independent eye
contact detector is still a challenging task. In this work, we address the task
of one-way eye contact detection for videos in the wild. Our goal is to build a
unified model that can identify when a person is looking at his gaze targets in
an arbitrary input video. Considering that this requires time-series relative
eye movement information, we propose to formulate the task as a temporal
segmentation. Due to the scarcity of labeled training data, we further propose
a gaze target discovery method to generate pseudo-labels for unlabeled videos,
which allows us to train a generic eye contact segmentation model in an
unsupervised way using in-the-wild videos. To evaluate our proposed approach,
we manually annotated a test dataset consisting of 52 videos of human
conversations. Experimental results show that our eye contact segmentation
model outperforms the previous video-dependent eye contact detector and can
achieve 71.88% framewise accuracy on our annotated test set. Our code and
evaluation dataset are available at
https://github.com/ut-vision/Video-Independent-ECS.
Related papers
- Real-time estimation of overt attention from dynamic features of the face using deep-learning [0.0]
We train a deep learning model to predict a measure of attention based on overt eye movements.
We measure Inter-Subject Correlation of eye movements in ten-second intervals while students watch the same educational videos.
The solution is lightweight and can operate on the client side, which mitigates some of the privacy concerns associated with online attention monitoring.
arXiv Detail & Related papers (2024-09-19T20:49:39Z) - Human-Object Interaction Prediction in Videos through Gaze Following [9.61701724661823]
We design a framework to detect current HOIs and anticipate future HOIs in videos.
We propose to leverage human information since people often fixate on an object before interacting with it.
Our model is trained and validated on the VidHOI dataset, which contains videos capturing daily life.
arXiv Detail & Related papers (2023-06-06T11:36:14Z) - Do Pedestrians Pay Attention? Eye Contact Detection in the Wild [75.54077277681353]
In urban environments, humans rely on eye contact for fast and efficient communication with nearby people.
In this paper, we focus on eye contact detection in the wild, i.e., real-world scenarios for autonomous vehicles with no control over the environment or the distance of pedestrians.
We introduce a model that leverages semantic keypoints to detect eye contact and show that this high-level representation achieves state-of-the-art results on the publicly-available dataset JAAD.
To study domain adaptation, we create LOOK: a large-scale dataset for eye contact detection in the wild, which focuses on diverse and un
arXiv Detail & Related papers (2021-12-08T10:21:28Z) - Weakly Supervised Human-Object Interaction Detection in Video via
Contrastive Spatiotemporal Regions [81.88294320397826]
A system does not know what human-object interactions are present in a video as or the actual location of the human and object.
We introduce a dataset comprising over 6.5k videos with human-object interaction that have been curated from sentence captions.
We demonstrate improved performance over weakly supervised baselines adapted to our annotations on our video dataset.
arXiv Detail & Related papers (2021-10-07T15:30:18Z) - MutualEyeContact: A conversation analysis tool with focus on eye contact [69.17395873398196]
MutualEyeContact can help scientists to understand the importance of (mutual) eye contact in social interactions.
We combine state-of-the-art eye tracking with face recognition based on machine learning and provide a tool for analysis and visualization of social interaction sessions.
arXiv Detail & Related papers (2021-07-09T15:05:53Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z) - Learning Person Re-identification Models from Videos with Weak
Supervision [53.53606308822736]
We introduce the problem of learning person re-identification models from videos with weak supervision.
We propose a multiple instance attention learning framework for person re-identification using such video-level labels.
The attention weights are obtained based on all person images instead of person tracklets in a video, making our learned model less affected by noisy annotations.
arXiv Detail & Related papers (2020-07-21T07:23:32Z) - Detecting Attended Visual Targets in Video [25.64146711657225]
We introduce a new annotated dataset, VideoAttentionTarget, containing complex and dynamic patterns of real-world gaze behavior.
Our experiments show that our model can effectively infer dynamic attention in videos.
We obtain the first results for automatically classifying clinically-relevant gaze behavior without wearable cameras or eye trackers.
arXiv Detail & Related papers (2020-03-05T09:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.