How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image
Domain? An Empirical Study Involving Art Classification
- URL: http://arxiv.org/abs/2208.04693v1
- Date: Tue, 9 Aug 2022 12:05:18 GMT
- Title: How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image
Domain? An Empirical Study Involving Art Classification
- Authors: Vincent Tonkes and Matthia Sabatelli
- Abstract summary: Vision Transformers (VTs) are becoming a valuable alternative to Convolutional Neural Networks (CNNs)
We study whether VTs that are pre-trained on the popular ImageNet dataset learn representations that are transferable to the non-natural image domain.
Our results show that VTs exhibit strong generalization properties and that these networks are more powerful feature extractors than CNNs.
- Score: 0.7614628596146599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision Transformers (VTs) are becoming a valuable alternative to
Convolutional Neural Networks (CNNs) when it comes to problems involving
high-dimensional and spatially organized inputs such as images. However, their
Transfer Learning (TL) properties are not yet well studied, and it is not fully
known whether these neural architectures can transfer across different domains
as well as CNNs. In this paper we study whether VTs that are pre-trained on the
popular ImageNet dataset learn representations that are transferable to the
non-natural image domain. To do so we consider three well-studied art
classification problems and use them as a surrogate for studying the TL
potential of four popular VTs. Their performance is extensively compared
against that of four common CNNs across several TL experiments. Our results
show that VTs exhibit strong generalization properties and that these networks
are more powerful feature extractors than CNNs.
Related papers
- Super Consistency of Neural Network Landscapes and Learning Rate Transfer [72.54450821671624]
We study the landscape through the lens of the loss Hessian.
We find that certain spectral properties under $mu$P are largely independent of the size of the network.
We show that in the Neural Tangent Kernel (NTK) and other scaling regimes, the sharpness exhibits very different dynamics at different scales.
arXiv Detail & Related papers (2024-02-27T12:28:01Z) - ViTs are Everywhere: A Comprehensive Study Showcasing Vision
Transformers in Different Domain [0.0]
Vision Transformers (ViTs) are becoming more popular and dominant solutions for many vision problems.
ViTs can overcome several possible difficulties with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2023-10-09T12:31:30Z) - Training Vision Transformers with Only 2040 Images [35.86457465241119]
Vision Transformers (ViTs) is emerging as an alternative to convolutional neural networks (CNNs) for visual recognition.
We give theoretical analyses that our method is superior to other methods in that it can capture both feature alignment and instance similarities.
We achieve state-of-the-art results when training from scratch on 7 small datasets under various ViT backbones.
arXiv Detail & Related papers (2022-01-26T03:22:08Z) - A Comprehensive Study of Vision Transformers on Dense Prediction Tasks [10.013443811899466]
Convolutional Neural Networks (CNNs) have been the standard choice in vision tasks.
Recent studies have shown that Vision Transformers (VTs) achieve comparable performance in challenging tasks such as object detection and semantic segmentation.
This poses several questions about their generalizability, robustness, reliability, and texture bias when used to extract features for complex tasks.
arXiv Detail & Related papers (2022-01-21T13:18:16Z) - Do Vision Transformers See Like Convolutional Neural Networks? [45.69780772718875]
Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks.
Are they acting like convolutional networks, or learning entirely different visual representations?
We find striking differences between the two architectures, such as ViT having more uniform representations across all layers.
arXiv Detail & Related papers (2021-08-19T17:27:03Z) - TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation [54.61786380919243]
Unsupervised domain adaptation (UDA) aims to transfer the knowledge learnt from a labeled source domain to an unlabeled target domain.
Previous work is mainly built upon convolutional neural networks (CNNs) to learn domain-invariant representations.
With the recent exponential increase in applying Vision Transformer (ViT) to vision tasks, the capability of ViT in adapting cross-domain knowledge remains unexplored in the literature.
arXiv Detail & Related papers (2021-08-12T22:37:43Z) - ConvNets vs. Transformers: Whose Visual Representations are More
Transferable? [49.62201738334348]
We investigate the transfer learning ability of ConvNets and vision transformers in 15 single-task and multi-task performance evaluations.
We observe consistent advantages of Transformer-based backbones on 13 downstream tasks.
arXiv Detail & Related papers (2021-08-11T16:20:38Z) - Efficient Training of Visual Transformers with Small-Size Datasets [64.60765211331697]
Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs)
We show that, despite having a comparable accuracy when trained on ImageNet, their performance on smaller datasets can be largely different.
We propose a self-supervised task which can extract additional information from images with only a negligible computational overhead.
arXiv Detail & Related papers (2021-06-07T16:14:06Z) - Intriguing Properties of Vision Transformers [114.28522466830374]
Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems.
We systematically study this question via an extensive set of experiments and comparisons with a high-performing convolutional neural network (CNN)
We show effective features of ViTs are due to flexible receptive and dynamic fields possible via the self-attention mechanism.
arXiv Detail & Related papers (2021-05-21T17:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.