Related papers: Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis

Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis

URL: http://arxiv.org/abs/2203.16414v1
Date: Wed, 30 Mar 2022 15:56:11 GMT
Title: Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis
Authors: Simon Dahan, Abdulah Fawaz, Logan Z. J. Williams, Chunhui Yang, Timothy S. Coalson, Matthew F. Glasser, A. David Edwards, Daniel Rueckert, Emma C. Robinson
Abstract summary: We introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold. A vision transformer model encodes the sequence of patches via successive multi-head self-attention layers. Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data.
Score: 8.20832544370228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Motivated by the success of attention-modelling in computer vision, we translate convolution-free vision transformer approaches to surface data, to introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold. Here, surface patching is achieved by representing spherical data as a sequence of triangular patches, extracted from a subdivided icosphere. A transformer model encodes the sequence of patches via successive multi-head self-attention layers while preserving the sequence resolution. We validate the performance of the proposed Surface Vision Transformer (SiT) on the task of phenotype regression from cortical surface metrics derived from the Developing Human Connectome Project (dHCP). Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data. Analysis of transformer attention maps offers strong potential to characterise subtle cognitive developmental patterns.

Related papers

Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation [6.550827841703163]
We introduce the attention-free Vision Mamba to spherical surfaces. Our method achieves surface patching by representing spherical data as a sequence of triangular patches. The proposed Surface Vision Mamba is evaluated on multiple neurodevelopmental phenotype regression tasks.
arXiv Detail & Related papers (2025-01-24T17:57:06Z)
Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions. Our approach uses diffusion models with a novel network architecture to learn surface point distributions. We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables. The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z)
The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z)
Surface Analysis with Vision Transformers [7.4330073456005685]
Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs. Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes.
arXiv Detail & Related papers (2022-05-31T14:41:01Z)
Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces [9.425082767553935]
Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that ViT could replace local feature learning operations of convolutional neural networks. We extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem. We validate our method on a range of different biomedical surface domains and tasks.
arXiv Detail & Related papers (2022-04-07T12:45:54Z)
Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z)
Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism. We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies. We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.