Unsupervised Gaze-aware Contrastive Learning with Subject-specific
Condition
- URL: http://arxiv.org/abs/2309.04506v1
- Date: Fri, 8 Sep 2023 09:45:19 GMT
- Title: Unsupervised Gaze-aware Contrastive Learning with Subject-specific
Condition
- Authors: Lingyu Du, Xucong Zhang, Guohao Lan
- Abstract summary: ConGaze is a contrastive learning-based framework that learns generic gaze-aware representations across subjects in an unsupervised way.
We introduce gaze-specific data augmentation to preserve the gaze-semantic features and maintain the gaze consistency.
We also devise a novel subject-conditional projection module that encourages a share feature extractor to learn gaze-aware and generic representations.
- Score: 6.547550920819356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Appearance-based gaze estimation has shown great promise in many applications
by using a single general-purpose camera as the input device. However, its
success is highly depending on the availability of large-scale well-annotated
gaze datasets, which are sparse and expensive to collect. To alleviate this
challenge we propose ConGaze, a contrastive learning-based framework that
leverages unlabeled facial images to learn generic gaze-aware representations
across subjects in an unsupervised way. Specifically, we introduce the
gaze-specific data augmentation to preserve the gaze-semantic features and
maintain the gaze consistency, which are proven to be crucial for effective
contrastive gaze representation learning. Moreover, we devise a novel
subject-conditional projection module that encourages a share feature extractor
to learn gaze-aware and generic representations. Our experiments on three
public gaze estimation datasets show that ConGaze outperforms existing
unsupervised learning solutions by 6.7% to 22.5%; and achieves 15.1% to 24.6%
improvement over its supervised learning-based counterpart in cross-dataset
evaluations.
Related papers
- UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training [12.680014448486242]
We propose UniGaze, leveraging large-scale in-the-wild facial datasets for gaze estimation through self-supervised pre-training.
Our experiments reveal that self-supervised approaches designed for semantic tasks fail when applied to gaze estimation.
We demonstrate that UniGaze significantly improves generalization across multiple data domains while minimizing reliance on costly labeled data.
arXiv Detail & Related papers (2025-02-04T13:24:23Z) - VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning [59.68917139718813]
We show that a strong off-the-shelf frozen pretrained visual encoder can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning.
By conditioning on frozen clip-level embeddings from observed steps to predict the actions of unseen steps, our prediction model is able to learn robust representations for forecasting.
arXiv Detail & Related papers (2024-10-04T14:52:09Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - Semi-supervised Contrastive Regression for Estimation of Eye Gaze [0.609170287691728]
This paper develops a semi-supervised contrastive learning framework for estimation of gaze direction.
With a small labeled gaze dataset, the framework is able to find a generalized solution even for unseen face images.
Our contrastive regression framework shows good performance in comparison to several state of the art contrastive learning techniques used for gaze estimation.
arXiv Detail & Related papers (2023-08-05T04:11:38Z) - Contrastive Representation Learning for Gaze Estimation [8.121462458089143]
We propose a contrastive representation learning framework for gaze estimation, named Gaze Contrastive Learning (GazeCLR)
Our results show that GazeCLR improves the performance of cross-domain gaze estimation and yields as high as 17.2% relative improvement.
The GazeCLR framework is competitive with state-of-the-art representation learning methods for few-shot evaluation.
arXiv Detail & Related papers (2022-10-24T17:01:18Z) - LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic
Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics.
By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z) - FreeGaze: Resource-efficient Gaze Estimation via Frequency Domain
Contrastive Learning [1.240096657086732]
FreeGaze is a resource-efficient framework for unsupervised gaze representation learning.
We show that FreeGaze can achieve comparable gaze estimation accuracy with existing supervised learning-based approach.
arXiv Detail & Related papers (2022-09-14T14:51:52Z) - Efficient Self-supervised Vision Transformers for Representation
Learning [86.57557009109411]
We show that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity.
We propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies.
Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation.
arXiv Detail & Related papers (2021-06-17T19:57:33Z) - Weakly-Supervised Physically Unconstrained Gaze Estimation [80.66438763587904]
We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
arXiv Detail & Related papers (2021-05-20T14:58:52Z) - PureGaze: Purifying Gaze Feature for Generalizable Gaze Estimation [12.076469954457007]
We tackle the domain generalization problem in cross-domain gaze estimation for unknown target domains.
To be specific, we realize the domain generalization by gaze feature purification.
We design a plug-and-play self-adversarial framework for the gaze feature purification.
arXiv Detail & Related papers (2021-03-24T13:22:00Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.