Towards Self-Supervised Gaze Estimation
- URL: http://arxiv.org/abs/2203.10974v1
- Date: Mon, 21 Mar 2022 13:35:16 GMT
- Title: Towards Self-Supervised Gaze Estimation
- Authors: Arya Farkhondeh, Cristina Palmero, Simone Scardapane, Sergio Escalera
- Abstract summary: We propose SwAT, an equivariant version of the online clustering-based self-supervised approach SwAV, to learn more informative representations for gaze estimation.
We achieve up to 57% and 25% improvements in cross-dataset and within-dataset evaluation tasks on existing benchmarks.
- Score: 32.91601919228028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent joint embedding-based self-supervised methods have surpassed standard
supervised approaches on various image recognition tasks such as image
classification. These self-supervised methods aim at maximizing agreement
between features extracted from two differently transformed views of the same
image, which results in learning an invariant representation with respect to
appearance and geometric image transformations. However, the effectiveness of
these approaches remains unclear in the context of gaze estimation, a
structured regression task that requires equivariance under geometric
transformations (e.g., rotations, horizontal flip). In this work, we propose
SwAT, an equivariant version of the online clustering-based self-supervised
approach SwAV, to learn more informative representations for gaze estimation.
We identify the most effective image transformations for self-supervised
pretraining and demonstrate that SwAT, with ResNet-50 and supported with
uncurated unlabeled face images, outperforms state-of-the-art gaze estimation
methods and supervised baselines in various experiments. In particular, we
achieve up to 57% and 25% improvements in cross-dataset and within-dataset
evaluation tasks on existing benchmarks (ETH-XGaze, Gaze360, and MPIIFaceGaze).
Related papers
- Contrastive Representation Learning for Gaze Estimation [8.121462458089143]
We propose a contrastive representation learning framework for gaze estimation, named Gaze Contrastive Learning (GazeCLR)
Our results show that GazeCLR improves the performance of cross-domain gaze estimation and yields as high as 17.2% relative improvement.
The GazeCLR framework is competitive with state-of-the-art representation learning methods for few-shot evaluation.
arXiv Detail & Related papers (2022-10-24T17:01:18Z) - Augmentation Invariance and Adaptive Sampling in Semantic Segmentation
of Agricultural Aerial Images [16.101248613062292]
We investigate the problem of Semantic for agricultural aerial imagery.
The existing methods used for this task are designed without considering two characteristics of the aerial data.
We propose a solution based on two ideas: (i) we use together a set of suitable augmentation and a consistency loss to guide the model to learn semantic representations that are invariant to the photometric and geometric shifts typical of the top-down perspective.
With an extensive set of experiments conducted on the Agriculture-Vision dataset, we demonstrate that our proposed strategies improve the performance of the current state-of-the-art method.
arXiv Detail & Related papers (2022-04-17T10:19:07Z) - Consistency Regularization for Deep Face Anti-Spoofing [69.70647782777051]
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems.
Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models.
We enhance both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS.
arXiv Detail & Related papers (2021-11-24T08:03:48Z) - Distribution Estimation to Automate Transformation Policies for
Self-Supervision [61.55875498848597]
In recent visual self-supervision works, an imitated classification objective, called pretext task, is established by assigning labels to transformed or augmented input images.
It is observed that image transformations already present in the dataset might be less effective in learning such self-supervised representations.
We propose a framework based on generative adversarial network to automatically find the transformations which are not present in the input dataset.
arXiv Detail & Related papers (2021-11-24T04:40:00Z) - Multi-view Contrastive Coding of Remote Sensing Images at Pixel-level [5.64497799927668]
A pixel-wise contrastive approach based on an unlabeled multi-view setting is proposed to overcome this limitation.
A pseudo-Siamese ResUnet is trained to learn a representation that aims to align features from the shifted positive pairs.
Results demonstrate both improvements in efficiency and accuracy over the state-of-the-art multi-view contrastive methods.
arXiv Detail & Related papers (2021-05-18T13:28:46Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Transformer Guided Geometry Model for Flow-Based Unsupervised Visual
Odometry [38.20137500372927]
We propose a method consisting of two camera pose estimators that deal with the information from pairwise images.
For image sequences, a Transformer-like structure is adopted to build a geometry model over a local temporal window.
A Flow-to-Flow Pose Estimator (F2FPE) is proposed to exploit the relationship between pairwise images.
arXiv Detail & Related papers (2020-12-08T19:39:26Z) - Semantic Change Detection with Asymmetric Siamese Networks [71.28665116793138]
Given two aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries.
This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management.
We present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures.
arXiv Detail & Related papers (2020-10-12T13:26:30Z) - Transformation Consistency Regularization- A Semi-Supervised Paradigm
for Image-to-Image Translation [18.870983535180457]
We propose Transformation Consistency Regularization, which delves into a more challenging setting of image-to-image translation.
We evaluate the efficacy of our algorithm on three different applications: image colorization, denoising and super-resolution.
Our method is significantly data efficient, requiring only around 10 - 20% of labeled samples to achieve similar image reconstructions to its fully-supervised counterpart.
arXiv Detail & Related papers (2020-07-15T17:41:35Z) - Unsupervised Learning of Visual Features by Contrasting Cluster
Assignments [57.33699905852397]
We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
arXiv Detail & Related papers (2020-06-17T14:00:42Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.