Surface Analysis with Vision Transformers
- URL: http://arxiv.org/abs/2205.15836v1
- Date: Tue, 31 May 2022 14:41:01 GMT
- Title: Surface Analysis with Vision Transformers
- Authors: Simon Dahan, Logan Z. J. Williams, Abdulah Fawaz, Daniel Rueckert,
Emma C. Robinson
- Abstract summary: Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs.
Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes.
- Score: 7.4330073456005685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The extension of convolutional neural networks (CNNs) to non-Euclidean
geometries has led to multiple frameworks for studying manifolds. Many of those
methods have shown design limitations resulting in poor modelling of long-range
associations, as the generalisation of convolutions to irregular surfaces is
non-trivial. Recent state-of-the-art performance of Vision Transformers (ViTs)
demonstrates that a general-purpose architecture, which implements
self-attention, could replace the local feature learning operations of CNNs.
Motivated by the success of attention-modelling in computer vision, we extend
ViTs to surfaces by reformulating the task of surface learning as a
sequence-to-sequence problem and propose a patching mechanism for surface
meshes. We validate the performance of the proposed Surface Vision Transformer
(SiT) on two brain age prediction tasks in the developing Human Connectome
Project (dHCP) dataset and investigate the impact of pre-training on model
performance. Experiments show that the SiT outperforms many surface CNNs, while
indicating some evidence of general transformation invariance. Code available
at https://github.com/metrics-lab/surface-vision-transformers
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry [1.2289361708127877]
We propose a causal visual-inertial fusion transformer (VIFT) for pose estimation in deep visual-inertial odometry.
The proposed method is end-to-end trainable and requires only a monocular camera and IMU during inference.
arXiv Detail & Related papers (2024-09-13T12:21:25Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning.
Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z) - Surface Vision Transformers: Flexible Attention-Based Modelling of
Biomedical Surfaces [9.425082767553935]
Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that ViT could replace local feature learning operations of convolutional neural networks.
We extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem.
We validate our method on a range of different biomedical surface domains and tasks.
arXiv Detail & Related papers (2022-04-07T12:45:54Z) - Surface Vision Transformers: Attention-Based Modelling applied to
Cortical Analysis [8.20832544370228]
We introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold.
A vision transformer model encodes the sequence of patches via successive multi-head self-attention layers.
Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data.
arXiv Detail & Related papers (2022-03-30T15:56:11Z) - A Comprehensive Study of Vision Transformers on Dense Prediction Tasks [10.013443811899466]
Convolutional Neural Networks (CNNs) have been the standard choice in vision tasks.
Recent studies have shown that Vision Transformers (VTs) achieve comparable performance in challenging tasks such as object detection and semantic segmentation.
This poses several questions about their generalizability, robustness, reliability, and texture bias when used to extract features for complex tasks.
arXiv Detail & Related papers (2022-01-21T13:18:16Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images [64.53227129573293]
We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views.
We design neural networks capable of generating high-quality parametric 3D surfaces which are consistent between views.
Our method is supervised and trained on a public dataset of shapes from common object categories.
arXiv Detail & Related papers (2020-08-18T06:33:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.