Related papers: Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces

Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces

URL: http://arxiv.org/abs/2204.03408v1
Date: Thu, 7 Apr 2022 12:45:54 GMT
Title: Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces
Authors: Simon Dahan, Hao Xu, Logan Z. J. Williams, Abdulah Fawaz, Chunhui Yang, Timothy S. Coalson, Michelle C. Williams, David E. Newby, A. David Edwards, Matthew F. Glasser, Alistair A. Young, Daniel Rueckert, Emma C. Robinson
Abstract summary: Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that ViT could replace local feature learning operations of convolutional neural networks. We extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem. We validate our method on a range of different biomedical surface domains and tasks.
Score: 9.425082767553935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem, by proposing patching mechanisms for general surface meshes. Sequences of patches are then processed by a transformer encoder and used for classification or regression. We validate our method on a range of different biomedical surface domains and tasks: brain age prediction in the developing Human Connectome Project (dHCP), fluid intelligence prediction in the Human Connectome Project (HCP), and coronary artery calcium score classification using surfaces from the Scottish Computed Tomography of the Heart (SCOT-HEART) dataset, and investigate the impact of pretraining and data augmentation on model performance. Results suggest that Surface Vision Transformers (SiT) demonstrate consistent improvement over geometric deep learning methods for brain age and fluid intelligence prediction and achieve comparable performance on calcium score classification to standard metrics used in clinical practice. Furthermore, analysis of transformer attention maps offers clear and individualised predictions of the features driving each task. Code is available on Github: https://github.com/metrics-lab/surface-vision-transformers

Related papers

Leveraging Point Transformers for Detecting Anatomical Landmarks in Digital Dentistry [0.0]
We present our experiments from the 3DTeethLand Grand Challenge at MICCAI 2024. We designed a Point Transformer v3 inspired module to capture meaningful geometric and anatomical features, which are processed by a lightweight decoder to predict per-point distances. We report promising results and discuss insights on learned feature interpretability.
arXiv Detail & Related papers (2025-04-15T17:34:56Z)
Toward Relative Positional Encoding in Spiking Transformers [52.62008099390541]
Spiking neural networks (SNNs) are bio-inspired networks that model how neurons in the brain communicate through discrete spikes. In this paper, we introduce an approximate method for relative positional encoding (RPE) in Spiking Transformers.
arXiv Detail & Related papers (2025-01-28T06:42:37Z)
Benchmark on Drug Target Interaction Modeling from a Structure Perspective [48.60648369785105]
Drug-target interaction prediction is crucial to drug discovery and design. Recent methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets. We conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms.
arXiv Detail & Related papers (2024-07-04T16:56:59Z)
A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images. We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders [10.097983222759884]
Surface Masked AutoEncoder (sMAE) and surface Masked AutoEncoder (MAE) These models are trained to reconstruct cortical feature maps from masked versions of the input by learning strong latent representations of cortical development and structure function. Results show that (v)sMAE pre-trained models improve phenotyping prediction performance on multiple tasks by $ge 26%$, and offer faster convergence relative to models trained from scratch.
arXiv Detail & Related papers (2023-08-10T10:01:56Z)
Masked Pre-Training of Transformers for Histology Image Analysis [4.710921988115685]
In digital pathology, whole slide images (WSIs) are widely used for applications such as cancer diagnosis and prognosis prediction. Visual transformer models have emerged as a promising method for encoding large regions of WSIs while preserving spatial relationships among patches. We propose a pretext task for training the transformer model without labeled data to address this problem. Our model, MaskHIT, uses the transformer output to reconstruct masked patches and learn representative histological features based on their positions and visual features.
arXiv Detail & Related papers (2023-04-14T23:56:49Z)
The Multiscale Surface Vision Transformer [10.833580445244094]
We introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks.
arXiv Detail & Related papers (2023-03-21T15:00:17Z)
Surface Analysis with Vision Transformers [7.4330073456005685]
Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs. Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes.
arXiv Detail & Related papers (2022-05-31T14:41:01Z)
Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis [8.20832544370228]
We introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold. A vision transformer model encodes the sequence of patches via successive multi-head self-attention layers. Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data.
arXiv Detail & Related papers (2022-03-30T15:56:11Z)
Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets. We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts. Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z)
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer [55.936527926778695]
Recent deep learning approaches focus on mining subtle r clues using convolutional neural networks with limited-temporal receptive fields. In this paper, we propose the PhysFormer, an end-to-end video transformer based architecture.
arXiv Detail & Related papers (2021-11-23T18:57:11Z)
Surface Warping Incorporating Machine Learning Assisted Domain Likelihood Estimation: A New Paradigm in Mine Geology Modelling and Automation [68.8204255655161]
A Bayesian warping technique has been proposed to reshape modeled surfaces based on geochemical and spatial constraints imposed by newly acquired blasthole data. This paper focuses on incorporating machine learning in this warping framework to make the likelihood generalizable. Its foundation is laid by a Bayesian computation in which the geological domain likelihood given the chemistry, p(g|c) plays a similar role to p(y(c)|g.
arXiv Detail & Related papers (2021-02-15T10:37:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.