Machine Learning for Brain Disorders: Transformers and Visual
Transformers
- URL: http://arxiv.org/abs/2303.12068v1
- Date: Tue, 21 Mar 2023 17:57:33 GMT
- Title: Machine Learning for Brain Disorders: Transformers and Visual
Transformers
- Authors: Robin Courant, Maika Edberg, Nicolas Dufour and Vicky Kalogeiton
- Abstract summary: Transformers were initially introduced for natural language processing (NLP) tasks, but fast they were adopted by most deep learning fields, including computer vision.
We introduce the Attention mechanism (Section 1), and then the Basic Transformer Block including the Vision Transformer.
Finally, we introduce Visual Transformers applied to tasks other than image classification, such as detection, segmentation, generation and training without labels.
- Score: 4.186575888568896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers were initially introduced for natural language processing (NLP)
tasks, but fast they were adopted by most deep learning fields, including
computer vision. They measure the relationships between pairs of input tokens
(words in the case of text strings, parts of images for visual Transformers),
termed attention. The cost is exponential with the number of tokens. For image
classification, the most common Transformer Architecture uses only the
Transformer Encoder in order to transform the various input tokens. However,
there are also numerous other applications in which the decoder part of the
traditional Transformer Architecture is also used. Here, we first introduce the
Attention mechanism (Section 1), and then the Basic Transformer Block including
the Vision Transformer (Section 2). Next, we discuss some improvements of
visual Transformers to account for small datasets or less computation(Section
3). Finally, we introduce Visual Transformers applied to tasks other than image
classification, such as detection, segmentation, generation and training
without labels (Section 4) and other domains, such as video or multimodality
using text or audio data (Section 5).
Related papers
- An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points.
In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z) - Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud
Understanding [62.502694656615496]
We present Progressive Point Patch Embedding and present a new point cloud Transformer model namely PViT.
PViT shares the same backbone as Transformer but is shown to be less hungry for data, enabling Transformer to achieve performance comparable to the state-of-the-art.
We formulate a simple yet effective pipeline dubbed "Pix4Point" that allows harnessing Transformers pretrained in the image domain to enhance downstream point cloud understanding.
arXiv Detail & Related papers (2022-08-25T17:59:29Z) - SepTr: Separable Transformer for Audio Spectrogram Processing [74.41172054754928]
We propose a new vision transformer architecture called Separable Transformer (SepTr)
SepTr employs two transformer blocks in a sequential manner, the first attending to tokens within the same frequency bin, and the second attending to tokens within the same time interval.
We conduct experiments on three benchmark data sets, showing that our architecture outperforms conventional vision transformers and other state-of-the-art methods.
arXiv Detail & Related papers (2022-03-17T19:48:43Z) - Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
for Image Recognition without Convolutions [1.1032962642000486]
This work is based on Vision Transformer, combined with the pyramid architecture, using Split-merge-transform to propose the group encoder and name the network architecture Aggregated Pyramid Vision Transformer (APVT)
We perform image classification tasks on the CIFAR-10 dataset and object detection tasks on the COCO 2017 dataset.
arXiv Detail & Related papers (2022-03-02T09:14:28Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Fully Transformer Networks for Semantic ImageSegmentation [26.037770622551882]
We explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN)
We propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, while reducing the computation complexity of the standard visual transformer(ViT)
Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from multiple levels of the PGT encoder for semantic image segmentation.
arXiv Detail & Related papers (2021-06-08T05:15:28Z) - Transformer-Based Deep Image Matching for Generalizable Person
Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images.
We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention.
We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.