Gaze Estimation using Transformer
- URL: http://arxiv.org/abs/2105.14424v1
- Date: Sun, 30 May 2021 04:06:29 GMT
- Title: Gaze Estimation using Transformer
- Authors: Yihua Cheng and Feng Lu
- Abstract summary: We consider two forms of vision transformer which are pure transformers and hybrid transformers.
We first follow the popular ViT and employ a pure transformer to estimate gaze from images.
On the other hand, we preserve the convolutional layers and integrate CNNs as well as transformers.
- Score: 14.26674946195107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has proven the effectiveness of transformers in many computer
vision tasks. However, the performance of transformers in gaze estimation is
still unexplored. In this paper, we employ transformers and assess their
effectiveness for gaze estimation. We consider two forms of vision transformer
which are pure transformers and hybrid transformers. We first follow the
popular ViT and employ a pure transformer to estimate gaze from images. On the
other hand, we preserve the convolutional layers and integrate CNNs as well as
transformers. The transformer serves as a component to complement CNNs. We
compare the performance of the two transformers in gaze estimation. The Hybrid
transformer significantly outperforms the pure transformer in all evaluation
datasets with less parameters. We further conduct experiments to assess the
effectiveness of the hybrid transformer and explore the advantage of
self-attention mechanism. Experiments show the hybrid transformer can achieve
state-of-the-art performance in all benchmarks with pre-training.To facilitate
further research, we release codes and models in
https://github.com/yihuacheng/GazeTR.
Related papers
- On Convolutional Vision Transformers for Yield Prediction [0.0]
The Convolution vision Transformer (CvT) is being tested to evaluate vision Transformers that are currently achieving state-of-the-art results in many other vision tasks.
It performs worse than widely tested methods such as XGBoost and CNNs, but shows that Transformers have potential to improve yield prediction.
arXiv Detail & Related papers (2024-02-08T10:50:12Z) - Transformers For Recognition In Overhead Imagery: A Reality Check [0.0]
We compare the impact of adding transformer structures into state-of-the-art segmentation models for overhead imagery.
Our results suggest that transformers provide consistent, but modest, performance improvements.
arXiv Detail & Related papers (2022-10-23T02:17:31Z) - Boosting vision transformers for image retrieval [11.441395750267052]
Vision transformers have achieved remarkable progress in vision tasks such as image classification and detection.
However, in instance-level image retrieval, transformers have not yet shown good performance compared to convolutional networks.
We propose a number of improvements that make transformers outperform the state of the art for the first time.
arXiv Detail & Related papers (2022-10-21T12:17:12Z) - On the Surprising Effectiveness of Transformers in Low-Labeled Video
Recognition [18.557920268145818]
Video vision transformers have been shown to be competitive with convolution-based methods (CNNs) broadly across multiple vision tasks.
Our work empirically explores the low data regime for video classification and discovers that, surprisingly, transformers perform extremely well in the low-labeled video setting.
We even show that using just the labeled data, transformers significantly outperform complex semi-supervised CNN methods that leverage large-scale unlabeled data as well.
arXiv Detail & Related papers (2022-09-15T17:12:30Z) - HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling [126.89573619301953]
We propose a new design of hierarchical vision transformers named HiViT (short for Hierarchical ViT)
HiViT enjoys both high efficiency and good performance in MIM.
In running MAE on ImageNet-1K, HiViT-B reports a +0.6% accuracy gain over ViT-B and a 1.9$times$ speed-up over Swin-B.
arXiv Detail & Related papers (2022-05-30T09:34:44Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation.
We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.
A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z) - Adaptive Transformers in RL [6.292138336765965]
Recent developments in Transformers have opened new areas of research in partially observable reinforcement learning tasks.
Results from late 2019 showed that Transformers are able to outperform LSTMs on both memory intense and reactive tasks.
arXiv Detail & Related papers (2020-04-08T01:03:10Z) - Transformer on a Diet [81.09119185568296]
Transformer has been widely used thanks to its ability to capture sequence information in an efficient way.
Recent developments, such as BERT and GPT-2, deliver only heavy architectures with a focus on effectiveness.
We explore three carefully-designed light Transformer architectures to figure out whether the Transformer with less computations could produce competitive results.
arXiv Detail & Related papers (2020-02-14T18:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.