Related papers: Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization

Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization

URL: http://arxiv.org/abs/2406.01314v1
Date: Mon, 3 Jun 2024 13:27:08 GMT
Title: Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
Authors: Firas Khader, Omar S. M. El Nahhas, Tianyu Han, Gustav Müller-Franzes, Sven Nebelung, Jakob Nikolas Kather, Daniel Truhn,
Abstract summary: The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. A critical limitation of this model is its quadratic computational and memory complexity relative to the sequence length. This is especially crucial in medical imaging where high-resolution images can reach gigapixel scale.
Score: 1.6275928583134276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. However, a critical limitation of this model is its quadratic computational and memory complexity relative to the sequence length, which constrains its application to longer sequences. This is especially crucial in medical imaging where high-resolution images can reach gigapixel scale. Efforts to address this issue have predominantely focused on complex techniques, such as decomposing the softmax operation integral to the Transformer's architecture. This paper addresses this quadratic computational complexity of Transformer models and introduces a remarkably simple and effective method that circumvents this issue by eliminating the softmax function from the attention mechanism and adopting a sequence normalization technique for the key, query, and value tokens. Coupled with a reordering of matrix multiplications this approach reduces the memory- and compute complexity to a linear scale. We evaluate this approach across various medical imaging datasets comprising fundoscopic, dermascopic, radiologic and histologic imaging data. Our findings highlight that these models exhibit a comparable performance to traditional transformer models, while efficiently handling longer sequences.

Related papers

Scalable Visual State Space Model with Fractal Scanning [16.077348474371547]
State Space Models (SSMs) have emerged as efficient alternatives to Transformer models. We propose using fractal scanning curves for patch serialization. We validate our method in image classification, detection, and segmentation tasks.
arXiv Detail & Related papers (2024-05-23T12:12:11Z)
Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention [0.464982780843177]
We present a variational quantum circuit architecture named Self-Attention Sequential Quantum Transformer Channel (SASQuaT) Our approach leverages recent insights from kernel-based operator learning in the context of predicting vision transformer network using simple gate operations and a set of multi-dimensional quantum Fourier transforms. To validate our approach, we consider image classification tasks in simulation and with hardware, where with only 9 qubits and a handful of parameters we are able to simultaneously embed and classify a grayscale image of handwritten digits with high accuracy.
arXiv Detail & Related papers (2024-03-21T18:00:04Z)
Optimization of array encoding for ultrasound imaging [2.357055571094446]
We use machine learning (ML) to construct scanning sequences, parameterized by time delays and apodization weights, that produce high-quality B-mode images. We demonstrate these results experimentally on both wire targets and a tissue-mimicking phantom.
arXiv Detail & Related papers (2024-03-01T05:19:59Z)
SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation. In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images. By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z)
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z)
One Model to Synthesize Them All: Multi-contrast Multi-scale Transformer for Missing Data Imputation [3.9207133968068684]
We formulate missing data imputation as a sequence-to-sequence learning problem. We propose a multi-contrast multi-scale Transformer (MMT) which can take any subset of input contrasts and synthesize those that are missing. MMT is inherently interpretable as it allows us to understand the importance of each input contrast in different regions.
arXiv Detail & Related papers (2022-04-28T18:49:27Z)
MAT: Mask-Aware Transformer for Large Hole Image Inpainting [79.67039090195527]
We present a novel model for large hole inpainting, which unifies the merits of transformers and convolutions. Experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets.
arXiv Detail & Related papers (2022-03-29T06:36:17Z)
Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data. Transformers have shown significant performance gains on natural language and high-level vision tasks. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z)
SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. Various attempts on approximating the self-attention with linear complexity have been made in Natural Language Processing. We identify that their limitations are rooted in keeping the softmax self-attention during approximations. For the first time, a softmax-free transformer or SOFT is proposed.
arXiv Detail & Related papers (2021-10-22T17:57:29Z)
XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z)
Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks. We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.