Toward Super-Resolution for Appearance-Based Gaze Estimation
- URL: http://arxiv.org/abs/2303.10151v1
- Date: Fri, 17 Mar 2023 17:40:32 GMT
- Title: Toward Super-Resolution for Appearance-Based Gaze Estimation
- Authors: Galen O'Shea, Majid Komeili
- Abstract summary: Super-resolution has been shown to improve image quality from a visual perspective.
We propose a two-step framework based on SwinIR super-resolution model.
Self-supervised learning aims to learn from unlabelled data to reduce the amount of required labeled data for downstream tasks.
- Score: 4.594159253008448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gaze tracking is a valuable tool with a broad range of applications in
various fields, including medicine, psychology, virtual reality, marketing, and
safety. Therefore, it is essential to have gaze tracking software that is
cost-efficient and high-performing. Accurately predicting gaze remains a
difficult task, particularly in real-world situations where images are affected
by motion blur, video compression, and noise. Super-resolution has been shown
to improve image quality from a visual perspective. This work examines the
usefulness of super-resolution for improving appearance-based gaze tracking. We
show that not all SR models preserve the gaze direction. We propose a two-step
framework based on SwinIR super-resolution model. The proposed method
consistently outperforms the state-of-the-art, particularly in scenarios
involving low-resolution or degraded images. Furthermore, we examine the use of
super-resolution through the lens of self-supervised learning for gaze
prediction. Self-supervised learning aims to learn from unlabelled data to
reduce the amount of required labeled data for downstream tasks. We propose a
novel architecture called SuperVision by fusing an SR backbone network to a
ResNet18 (with some skip connections). The proposed SuperVision method uses 5x
less labeled data and yet outperforms, by 15%, the state-of-the-art method of
GazeTR which uses 100% of training data.
Related papers
- Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - A General Method to Incorporate Spatial Information into Loss Functions for GAN-based Super-resolution Models [25.69505971220203]
Generative Adversarial Networks (GANs) have shown great performance on super-resolution problems.
GANs often introduce side effects into the outputs, such as unexpected artifacts and noises.
We propose a general method that can be effectively used in most GAN-based super-resolution (SR) models by introducing essential spatial information into the training process.
arXiv Detail & Related papers (2024-03-15T17:29:16Z) - Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.
Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.
We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - L2CS-Net: Fine-Grained Gaze Estimation in Unconstrained Environments [2.5234156040689237]
We propose a robust CNN-based model for predicting gaze in unconstrained settings.
We use two identical losses, one for each angle, to improve network learning and increase its generalization.
Our proposed model achieves state-of-the-art accuracy of 3.92deg and 10.41deg on MPIIGaze and Gaze360 datasets, respectively.
arXiv Detail & Related papers (2022-03-07T12:35:39Z) - High Quality Segmentation for Ultra High-resolution Images [72.97958314291648]
We propose the Continuous Refinement Model for the ultra high-resolution segmentation refinement task.
Our proposed method is fast and effective on image segmentation refinement.
arXiv Detail & Related papers (2021-11-29T11:53:06Z) - Exploiting Raw Images for Real-Scene Super-Resolution [105.18021110372133]
We study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
We propose a method to generate more realistic training data by mimicking the imaging process of digital cameras.
We also develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images.
arXiv Detail & Related papers (2021-02-02T16:10:15Z) - Self-Learning Transformations for Improving Gaze and Head Redirection [49.61091281780071]
We propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles.
This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc.
We show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation.
arXiv Detail & Related papers (2020-10-23T11:18:37Z) - Unsupervised Foveal Vision Neural Networks with Top-Down Attention [0.3058685580689604]
We propose the fusion of bottom-up saliency and top-down attention employing only unsupervised learning techniques.
We test the performance of the proposed Gamma saliency technique on the Toronto and CAT2000 databases.
We also develop a topdown attention mechanism based on the Gamma saliency applied to the top layer of CNNs to improve scene understanding in multi-object images or images with strong background clutter.
arXiv Detail & Related papers (2020-10-18T20:55:49Z) - Rethinking Data Augmentation for Image Super-resolution: A Comprehensive
Analysis and a New Strategy [21.89072742618842]
We provide a comprehensive analysis of the existing augmentation methods applied to the super-resolution task.
We propose CutBlur that cuts a low-resolution patch and pastes it to the corresponding high-resolution image region and vice versa.
Our method consistently and significantly improves the performance across various scenarios.
arXiv Detail & Related papers (2020-04-01T13:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.