Towards Self-Supervised Gaze Estimation
- URL: http://arxiv.org/abs/2203.10974v1
- Date: Mon, 21 Mar 2022 13:35:16 GMT
- Title: Towards Self-Supervised Gaze Estimation
- Authors: Arya Farkhondeh, Cristina Palmero, Simone Scardapane, Sergio Escalera
- Abstract summary: We propose SwAT, an equivariant version of the online clustering-based self-supervised approach SwAV, to learn more informative representations for gaze estimation.
We achieve up to 57% and 25% improvements in cross-dataset and within-dataset evaluation tasks on existing benchmarks.
- Score: 32.91601919228028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent joint embedding-based self-supervised methods have surpassed standard
supervised approaches on various image recognition tasks such as image
classification. These self-supervised methods aim at maximizing agreement
between features extracted from two differently transformed views of the same
image, which results in learning an invariant representation with respect to
appearance and geometric image transformations. However, the effectiveness of
these approaches remains unclear in the context of gaze estimation, a
structured regression task that requires equivariance under geometric
transformations (e.g., rotations, horizontal flip). In this work, we propose
SwAT, an equivariant version of the online clustering-based self-supervised
approach SwAV, to learn more informative representations for gaze estimation.
We identify the most effective image transformations for self-supervised
pretraining and demonstrate that SwAT, with ResNet-50 and supported with
uncurated unlabeled face images, outperforms state-of-the-art gaze estimation
methods and supervised baselines in various experiments. In particular, we
achieve up to 57% and 25% improvements in cross-dataset and within-dataset
evaluation tasks on existing benchmarks (ETH-XGaze, Gaze360, and MPIIFaceGaze).
Related papers
- Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms [27.882122236282054]
We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2.
We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions.
Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs.
arXiv Detail & Related papers (2024-09-25T11:55:27Z) - Merging Multiple Datasets for Improved Appearance-Based Gaze Estimation [10.682719521609743]
Two-stage Transformer-based Gaze-feature Fusion (TTGF) method uses transformers to merge information from each eye and the face separately and then merge across the two eyes.
Our proposed Gaze Adaptation Module (GAM) method handles annotation inconsis-tency by applying a Gaze Adaption Module for each dataset to correct gaze estimates from a single shared estimator.
arXiv Detail & Related papers (2024-09-02T02:51:40Z) - Contrastive Representation Learning for Gaze Estimation [8.121462458089143]
We propose a contrastive representation learning framework for gaze estimation, named Gaze Contrastive Learning (GazeCLR)
Our results show that GazeCLR improves the performance of cross-domain gaze estimation and yields as high as 17.2% relative improvement.
The GazeCLR framework is competitive with state-of-the-art representation learning methods for few-shot evaluation.
arXiv Detail & Related papers (2022-10-24T17:01:18Z) - Consistency Regularization for Deep Face Anti-Spoofing [69.70647782777051]
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems.
Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models.
We enhance both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS.
arXiv Detail & Related papers (2021-11-24T08:03:48Z) - Distribution Estimation to Automate Transformation Policies for
Self-Supervision [61.55875498848597]
In recent visual self-supervision works, an imitated classification objective, called pretext task, is established by assigning labels to transformed or augmented input images.
It is observed that image transformations already present in the dataset might be less effective in learning such self-supervised representations.
We propose a framework based on generative adversarial network to automatically find the transformations which are not present in the input dataset.
arXiv Detail & Related papers (2021-11-24T04:40:00Z) - Multi-view Contrastive Coding of Remote Sensing Images at Pixel-level [5.64497799927668]
A pixel-wise contrastive approach based on an unlabeled multi-view setting is proposed to overcome this limitation.
A pseudo-Siamese ResUnet is trained to learn a representation that aims to align features from the shifted positive pairs.
Results demonstrate both improvements in efficiency and accuracy over the state-of-the-art multi-view contrastive methods.
arXiv Detail & Related papers (2021-05-18T13:28:46Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Semantic Change Detection with Asymmetric Siamese Networks [71.28665116793138]
Given two aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries.
This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management.
We present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures.
arXiv Detail & Related papers (2020-10-12T13:26:30Z) - Transformation Consistency Regularization- A Semi-Supervised Paradigm
for Image-to-Image Translation [18.870983535180457]
We propose Transformation Consistency Regularization, which delves into a more challenging setting of image-to-image translation.
We evaluate the efficacy of our algorithm on three different applications: image colorization, denoising and super-resolution.
Our method is significantly data efficient, requiring only around 10 - 20% of labeled samples to achieve similar image reconstructions to its fully-supervised counterpart.
arXiv Detail & Related papers (2020-07-15T17:41:35Z) - Unsupervised Learning of Visual Features by Contrasting Cluster
Assignments [57.33699905852397]
We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
arXiv Detail & Related papers (2020-06-17T14:00:42Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.