Surface Vision Transformers: Flexible Attention-Based Modelling of
Biomedical Surfaces
- URL: http://arxiv.org/abs/2204.03408v1
- Date: Thu, 7 Apr 2022 12:45:54 GMT
- Title: Surface Vision Transformers: Flexible Attention-Based Modelling of
Biomedical Surfaces
- Authors: Simon Dahan, Hao Xu, Logan Z. J. Williams, Abdulah Fawaz, Chunhui
Yang, Timothy S. Coalson, Michelle C. Williams, David E. Newby, A. David
Edwards, Matthew F. Glasser, Alistair A. Young, Daniel Rueckert, Emma C.
Robinson
- Abstract summary: Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that ViT could replace local feature learning operations of convolutional neural networks.
We extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem.
We validate our method on a range of different biomedical surface domains and tasks.
- Score: 9.425082767553935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent state-of-the-art performances of Vision Transformers (ViT) in computer
vision tasks demonstrate that a general-purpose architecture, which implements
long-range self-attention, could replace the local feature learning operations
of convolutional neural networks. In this paper, we extend ViTs to surfaces by
reformulating the task of surface learning as a sequence-to-sequence learning
problem, by proposing patching mechanisms for general surface meshes. Sequences
of patches are then processed by a transformer encoder and used for
classification or regression. We validate our method on a range of different
biomedical surface domains and tasks: brain age prediction in the developing
Human Connectome Project (dHCP), fluid intelligence prediction in the Human
Connectome Project (HCP), and coronary artery calcium score classification
using surfaces from the Scottish Computed Tomography of the Heart (SCOT-HEART)
dataset, and investigate the impact of pretraining and data augmentation on
model performance. Results suggest that Surface Vision Transformers (SiT)
demonstrate consistent improvement over geometric deep learning methods for
brain age and fluid intelligence prediction and achieve comparable performance
on calcium score classification to standard metrics used in clinical practice.
Furthermore, analysis of transformer attention maps offers clear and
individualised predictions of the features driving each task. Code is available
on Github: https://github.com/metrics-lab/surface-vision-transformers
Related papers
- Benchmark on Drug Target Interaction Modeling from a Structure Perspective [48.60648369785105]
Drug-target interaction prediction is crucial to drug discovery and design.
Recent methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets.
We conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms.
arXiv Detail & Related papers (2024-07-04T16:56:59Z) - A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images.
We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders [10.097983222759884]
Surface Masked AutoEncoder (sMAE) and surface Masked AutoEncoder (MAE)
These models are trained to reconstruct cortical feature maps from masked versions of the input by learning strong latent representations of cortical development and structure function.
Results show that (v)sMAE pre-trained models improve phenotyping prediction performance on multiple tasks by $ge 26%$, and offer faster convergence relative to models trained from scratch.
arXiv Detail & Related papers (2023-08-10T10:01:56Z) - Masked Pre-Training of Transformers for Histology Image Analysis [4.710921988115685]
In digital pathology, whole slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction.
Visual transformer models have emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches.
We propose a pretext task for training the transformer model without labeled data to address this problem.
Our model, MaskHIT, uses the transformer output to reconstruct masked patches and learn representative histological features based on their positions and visual features.
arXiv Detail & Related papers (2023-04-14T23:56:49Z) - The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning.
Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z) - Surface Analysis with Vision Transformers [7.4330073456005685]
Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs.
Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes.
arXiv Detail & Related papers (2022-05-31T14:41:01Z) - Surface Vision Transformers: Attention-Based Modelling applied to
Cortical Analysis [8.20832544370228]
We introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold.
A vision transformer model encodes the sequence of patches via successive multi-head self-attention layers.
Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data.
arXiv Detail & Related papers (2022-03-30T15:56:11Z) - Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets.
We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts.
Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z) - PhysFormer: Facial Video-based Physiological Measurement with Temporal
Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields.
In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z) - Surface Warping Incorporating Machine Learning Assisted Domain
Likelihood Estimation: A New Paradigm in Mine Geology Modelling and
Automation [68.8204255655161]
A Bayesian warping technique has been proposed to reshape modeled surfaces based on geochemical and spatial constraints imposed by newly acquired blasthole data.
This paper focuses on incorporating machine learning in this warping framework to make the likelihood generalizable.
Its foundation is laid by a Bayesian computation in which the geological domain likelihood given the chemistry, p(g|c) plays a similar role to p(y(c)|g.
arXiv Detail & Related papers (2021-02-15T10:37:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.