Surface Vision Transformers: Attention-Based Modelling applied to
Cortical Analysis
- URL: http://arxiv.org/abs/2203.16414v1
- Date: Wed, 30 Mar 2022 15:56:11 GMT
- Title: Surface Vision Transformers: Attention-Based Modelling applied to
Cortical Analysis
- Authors: Simon Dahan, Abdulah Fawaz, Logan Z. J. Williams, Chunhui Yang,
Timothy S. Coalson, Matthew F. Glasser, A. David Edwards, Daniel Rueckert,
Emma C. Robinson
- Abstract summary: We introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold.
A vision transformer model encodes the sequence of patches via successive multi-head self-attention layers.
Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data.
- Score: 8.20832544370228
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The extension of convolutional neural networks (CNNs) to non-Euclidean
geometries has led to multiple frameworks for studying manifolds. Many of those
methods have shown design limitations resulting in poor modelling of long-range
associations, as the generalisation of convolutions to irregular surfaces is
non-trivial. Motivated by the success of attention-modelling in computer
vision, we translate convolution-free vision transformer approaches to surface
data, to introduce a domain-agnostic architecture to study any surface data
projected onto a spherical manifold. Here, surface patching is achieved by
representing spherical data as a sequence of triangular patches, extracted from
a subdivided icosphere. A transformer model encodes the sequence of patches via
successive multi-head self-attention layers while preserving the sequence
resolution. We validate the performance of the proposed Surface Vision
Transformer (SiT) on the task of phenotype regression from cortical surface
metrics derived from the Developing Human Connectome Project (dHCP).
Experiments show that the SiT generally outperforms surface CNNs, while
performing comparably on registered and unregistered data. Analysis of
transformer attention maps offers strong potential to characterise subtle
cognitive developmental patterns.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Learning Modulated Transformation in GANs [69.95217723100413]
We equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM)
MTM predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations.
It is noteworthy that towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.
arXiv Detail & Related papers (2023-08-29T17:51:22Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning.
Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z) - Surface Analysis with Vision Transformers [7.4330073456005685]
Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs.
Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes.
arXiv Detail & Related papers (2022-05-31T14:41:01Z) - Surface Vision Transformers: Flexible Attention-Based Modelling of
Biomedical Surfaces [9.425082767553935]
Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that ViT could replace local feature learning operations of convolutional neural networks.
We extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem.
We validate our method on a range of different biomedical surface domains and tasks.
arXiv Detail & Related papers (2022-04-07T12:45:54Z) - Revisiting Transformation Invariant Geometric Deep Learning: Are Initial
Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance.
Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling.
We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.