Fine-tuning Vision Transformers for the Prediction of State Variables in
Ising Models
- URL: http://arxiv.org/abs/2109.13925v1
- Date: Tue, 28 Sep 2021 00:23:31 GMT
- Title: Fine-tuning Vision Transformers for the Prediction of State Variables in
Ising Models
- Authors: Onur Kara and Arijit Sehanobish and Hector H Corzo
- Abstract summary: Transformers are state-of-the-art deep learning models that are composed of stacked attention and point-wise, fully connected layers.
In this work, a Vision Transformer (ViT) is applied to predict the state variables of 2-dimensional Ising model simulations.
- Score: 2.9005223064604078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers are state-of-the-art deep learning models that are composed of
stacked attention and point-wise, fully connected layers designed for handling
sequential data. Transformers are not only ubiquitous throughout Natural
Language Processing (NLP), but, recently, they have inspired a new wave of
Computer Vision (CV) applications research. In this work, a Vision Transformer
(ViT) is applied to predict the state variables of 2-dimensional Ising model
simulations. Our experiments show that ViT outperform state-of-the-art
Convolutional Neural Networks (CNN) when using a small number of microstate
images from the Ising model corresponding to various boundary conditions and
temperatures. This work opens the possibility of applying ViT to other
simulations, and raises interesting research directions on how attention maps
can learn about the underlying physics governing different phenomena.
Related papers
- ViTs are Everywhere: A Comprehensive Study Showcasing Vision
Transformers in Different Domain [0.0]
Vision Transformers (ViTs) are becoming more popular and dominant solutions for many vision problems.
ViTs can overcome several possible difficulties with convolutional neural networks (CNNs)
arXiv Detail & Related papers (2023-10-09T12:31:30Z) - PriViT: Vision Transformers for Fast Private Inference [55.36478271911595]
Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications.
ViTs are ill-suited for private inference using secure multi-party protocols, due to the large number of non-polynomial operations.
We propose PriViT, an algorithm to selectively " Taylorize" nonlinearities in ViTs while maintaining their prediction accuracy.
arXiv Detail & Related papers (2023-10-06T21:45:05Z) - Surface Analysis with Vision Transformers [7.4330073456005685]
Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs.
Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes.
arXiv Detail & Related papers (2022-05-31T14:41:01Z) - GTrans: Spatiotemporal Autoregressive Transformer with Graph Embeddings
for Nowcasting Extreme Events [5.672898304129217]
This paper proposes atemporal model, namely GTrans, that transforms data features into graph embeddings and predicts temporal dynamics with a transformer model.
According to our experiments, we demonstrate that GTrans can model spatial and temporal dynamics and nowcasts extreme events for datasets.
arXiv Detail & Related papers (2022-01-18T03:26:24Z) - Can Vision Transformers Perform Convolution? [78.42076260340869]
We prove that a single ViT layer with image patches as the input can perform any convolution operation constructively.
We provide a lower bound on the number of heads for Vision Transformers to express CNNs.
arXiv Detail & Related papers (2021-11-02T03:30:17Z) - ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Vision Transformers for Dense Prediction [77.34726150561087]
We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Our experiments show that this architecture yields substantial improvements on dense prediction tasks.
arXiv Detail & Related papers (2021-03-24T18:01:17Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.