Transformer-based end-to-end classification of variable-length
volumetric data
- URL: http://arxiv.org/abs/2307.06666v2
- Date: Fri, 21 Jul 2023 12:15:16 GMT
- Title: Transformer-based end-to-end classification of variable-length
volumetric data
- Authors: Marzieh Oghbaie, Teresa Araujo, Taha Emre, Ursula Schmidt-Erfurth,
Hrvoje Bogunovic
- Abstract summary: We propose an end-to-end Transformer-based framework that allows to classify data of variable length in an efficient fashion.
We evaluate the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement on a 9-class diagnostic task.
- Score: 4.053910482393197
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The automatic classification of 3D medical data is memory-intensive. Also,
variations in the number of slices between samples is common. Na\"ive solutions
such as subsampling can solve these problems, but at the cost of potentially
eliminating relevant diagnosis information. Transformers have shown promising
performance for sequential data analysis. However, their application for long
sequences is data, computationally, and memory demanding. In this paper, we
propose an end-to-end Transformer-based framework that allows to classify
volumetric data of variable length in an efficient fashion. Particularly, by
randomizing the input volume-wise resolution(#slices) during training, we
enhance the capacity of the learnable positional embedding assigned to each
volume slice. Consequently, the accumulated positional information in each
positional embedding can be generalized to the neighbouring slices, even for
high-resolution volumes at the test time. By doing so, the model will be more
robust to variable volume length and amenable to different computational
budgets. We evaluated the proposed approach in retinal OCT volume
classification and achieved 21.96% average improvement in balanced accuracy on
a 9-class diagnostic task, compared to state-of-the-art video transformers. Our
findings show that varying the volume-wise resolution of the input during
training results in more informative volume representation as compared to
training with fixed number of slices per volume.
Related papers
- ENACT: Entropy-based Clustering of Attention Input for Improving the Computational Performance of Object Detection Transformers [0.0]
Transformers demonstrate competitive performance in terms of precision on the problem of vision-based object detection.
We propose to cluster the transformer input on the basis of its entropy.
Clustering reduces the size of data given as input to the transformer and therefore reduces training time and GPU memory usage.
arXiv Detail & Related papers (2024-09-11T18:03:59Z) - AdaSelection: Accelerating Deep Learning Training through Data
Subsampling [27.46630703428186]
We introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch.
Compared with industry-standard baselines, AdaSelection consistently displays superior performance.
arXiv Detail & Related papers (2023-06-19T07:01:28Z) - Randomized Quantization: A Generic Augmentation for Data Agnostic
Self-supervised Learning [89.00646449740606]
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part.
Data augmentation lies at the core for creating the information gap.
In this paper, we explore the channel dimension for generic data augmentation by exploiting precision redundancy.
arXiv Detail & Related papers (2022-12-19T18:59:57Z) - FONDUE: an algorithm to find the optimal dimensionality of the latent
representations of variational autoencoders [2.969705152497174]
In this paper, we explore the intrinsic dimension estimation (IDE) of the data and latent representations learned by VAEs.
We show that the discrepancies between theIDE of the mean and sampled representations of a VAE after only a few steps of training reveal the presence of passive variables in the latent space.
We propose FONDUE: an algorithm which quickly finds the number of latent dimensions after which the mean and sampled representations start to diverge.
arXiv Detail & Related papers (2022-09-26T15:59:54Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - SVoRT: Iterative Transformer for Slice-to-Volume Registration in Fetal
Brain MRI [5.023544755441559]
We propose a novel slice-to-volume registration method using Transformers trained on synthetically transformed data.
Our model automatically detects the relevance between slices and predicts the transformation of one slice using information from other slices.
Experiments with real-world MRI data are also performed to demonstrate the ability of the proposed model to improve the quality of 3D reconstruction under severe fetal motion.
arXiv Detail & Related papers (2022-06-22T01:55:42Z) - A Volumetric Transformer for Accurate 3D Tumor Segmentation [25.961484035609672]
This paper presents a Transformer architecture for medical image segmentation.
The Transformer has a U-shaped volumetric encoder-decoder design that processes the input voxels in their entirety.
We show that our model transfer better representations across-datasets and are robust against data corruptions.
arXiv Detail & Related papers (2021-11-26T02:49:51Z) - Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it.
Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z) - Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
with Search [84.94597821711808]
We extend PoWER-BERT (Goyal et al., 2020) and propose Length-Adaptive Transformer that can be used for various inference scenarios after one-shot training.
We conduct a multi-objective evolutionary search to find a length configuration that maximizes the accuracy and minimizes the efficiency metric under any given computational budget.
We empirically verify the utility of the proposed approach by demonstrating the superior accuracy-efficiency trade-off under various setups.
arXiv Detail & Related papers (2020-10-14T12:28:08Z) - Variable Skipping for Autoregressive Range Density Estimation [84.60428050170687]
We show a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.
We show that variable skipping provides 10-100$times$ efficiency improvements when targeting challenging high-quantile error metrics.
arXiv Detail & Related papers (2020-07-10T19:01:40Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.