Tensor-to-Image: Image-to-Image Translation with Vision Transformers
- URL: http://arxiv.org/abs/2110.08037v1
- Date: Wed, 6 Oct 2021 17:57:45 GMT
- Title: Tensor-to-Image: Image-to-Image Translation with Vision Transformers
- Authors: Yi\u{g}it G\"und\"u\c{c}
- Abstract summary: In this paper, we utilize a vision transformer-based custom-designed model, tensor-to-image, for the image to image translation.
With the help of self-attention, our model was able to generalize and apply to different problems without a single modification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers gain huge attention since they are first introduced and have a
wide range of applications. Transformers start to take over all areas of deep
learning and the Vision transformers paper also proved that they can be used
for computer vision tasks. In this paper, we utilized a vision
transformer-based custom-designed model, tensor-to-image, for the image to
image translation. With the help of self-attention, our model was able to
generalize and apply to different problems without a single modification.
Related papers
- Adventures of Trustworthy Vision-Language Models: A Survey [54.76511683427566]
This paper conducts a thorough examination of vision-language transformers, employing three fundamental principles of responsible AI: Bias, Robustness, and Interpretability.
The primary objective of this paper is to delve into the intricacies and complexities associated with the practical use of transformers, with the overarching goal of advancing our comprehension of how to enhance their reliability and accountability.
arXiv Detail & Related papers (2023-12-07T11:31:20Z) - Introduction to Transformers: an NLP Perspective [59.0241868728732]
We introduce basic concepts of Transformers and present key techniques that form the recent advances of these models.
This includes a description of the standard Transformer architecture, a series of model refinements, and common applications.
arXiv Detail & Related papers (2023-11-29T13:51:04Z) - Inspecting Explainability of Transformer Models with Additional
Statistical Information [27.04589064942369]
Chefer et al. can visualize the Transformer on vision and multi-modal tasks effectively by combining attention layers to show the importance of each image patch.
However, when applying to other variants of Transformer such as the Swin Transformer, this method can not focus on the predicted object.
Our method, by considering the statistics of tokens in layer normalization layers, shows a great ability to interpret the explainability of Swin Transformer and ViT.
arXiv Detail & Related papers (2023-11-19T17:22:50Z) - Holistically Explainable Vision Transformers [136.27303006772294]
We propose B-cos transformers, which inherently provide holistic explanations for their decisions.
Specifically, we formulate each model component - such as the multi-layer perceptrons, attention layers, and the tokenisation module - to be dynamic linear.
We apply our proposed design to Vision Transformers (ViTs) and show that the resulting models, dubbed Bcos-ViTs, are highly interpretable and perform competitively to baseline ViTs.
arXiv Detail & Related papers (2023-01-20T16:45:34Z) - Learning Explicit Object-Centric Representations with Vision
Transformers [81.38804205212425]
We build on the self-supervision task of masked autoencoding and explore its effectiveness for learning object-centric representations with transformers.
We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.
arXiv Detail & Related papers (2022-10-25T16:39:49Z) - Vision Transformers: State of the Art and Research Challenges [26.462994554165697]
This paper presents a comprehensive overview of the literature on different architecture designs and training tricks for vision transformers.
Our goal is to provide a systematic review with the open research opportunities.
arXiv Detail & Related papers (2022-07-07T02:01:56Z) - Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning.
We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.