Related papers: Transformer-based end-to-end classification of variable-length volumetric data

Transformer-based end-to-end classification of variable-length volumetric data

URL: http://arxiv.org/abs/2307.06666v2
Date: Fri, 21 Jul 2023 12:15:16 GMT
Title: Transformer-based end-to-end classification of variable-length volumetric data
Authors: Marzieh Oghbaie, Teresa Araujo, Taha Emre, Ursula Schmidt-Erfurth, Hrvoje Bogunovic
Abstract summary: We propose an end-to-end Transformer-based framework that allows to classify data of variable length in an efficient fashion. We evaluate the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement on a 9-class diagnostic task.
Score: 4.053910482393197
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The automatic classification of 3D medical data is memory-intensive. Also, variations in the number of slices between samples is common. Na\"ive solutions such as subsampling can solve these problems, but at the cost of potentially eliminating relevant diagnosis information. Transformers have shown promising performance for sequential data analysis. However, their application for long sequences is data, computationally, and memory demanding. In this paper, we propose an end-to-end Transformer-based framework that allows to classify volumetric data of variable length in an efficient fashion. Particularly, by randomizing the input volume-wise resolution(#slices) during training, we enhance the capacity of the learnable positional embedding assigned to each volume slice. Consequently, the accumulated positional information in each positional embedding can be generalized to the neighbouring slices, even for high-resolution volumes at the test time. By doing so, the model will be more robust to variable volume length and amenable to different computational budgets. We evaluated the proposed approach in retinal OCT volume classification and achieved 21.96% average improvement in balanced accuracy on a 9-class diagnostic task, compared to state-of-the-art video transformers. Our findings show that varying the volume-wise resolution of the input during training results in more informative volume representation as compared to training with fixed number of slices per volume.

Related papers

ENACT: Entropy-based Clustering of Attention Input for Improving the Computational Performance of Object Detection Transformers [0.0]
Transformers demonstrate competitive performance in terms of precision on the problem of vision-based object detection. We propose to cluster the transformer input on the basis of its entropy. Clustering reduces the size of data given as input to the transformer and therefore reduces training time and GPU memory usage.
arXiv Detail & Related papers (2024-09-11T18:03:59Z)
AdaSelection: Accelerating Deep Learning Training through Data Subsampling [27.46630703428186]
We introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch. Compared with industry-standard baselines, AdaSelection consistently displays superior performance.
arXiv Detail & Related papers (2023-06-19T07:01:28Z)
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning [89.00646449740606]
Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Data augmentation lies at the core for creating the information gap. In this paper, we explore the channel dimension for generic data augmentation by exploiting precision redundancy.
arXiv Detail & Related papers (2022-12-19T18:59:57Z)
FONDUE: an algorithm to find the optimal dimensionality of the latent representations of variational autoencoders [2.969705152497174]
In this paper, we explore the intrinsic dimension estimation (IDE) of the data and latent representations learned by VAEs. We show that the discrepancies between theIDE of the mean and sampled representations of a VAE after only a few steps of training reveal the presence of passive variables in the latent space. We propose FONDUE: an algorithm which quickly finds the number of latent dimensions after which the mean and sampled representations start to diverge.
arXiv Detail & Related papers (2022-09-26T15:59:54Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
SVoRT: Iterative Transformer for Slice-to-Volume Registration in Fetal Brain MRI [5.023544755441559]
We propose a novel slice-to-volume registration method using Transformers trained on synthetically transformed data. Our model automatically detects the relevance between slices and predicts the transformation of one slice using information from other slices. Experiments with real-world MRI data are also performed to demonstrate the ability of the proposed model to improve the quality of 3D reconstruction under severe fetal motion.
arXiv Detail & Related papers (2022-06-22T01:55:42Z)
A Volumetric Transformer for Accurate 3D Tumor Segmentation [25.961484035609672]
This paper presents a Transformer architecture for medical image segmentation. The Transformer has a U-shaped volumetric encoder-decoder design that processes the input voxels in their entirety. We show that our model transfer better representations across-datasets and are robust against data corruptions.
arXiv Detail & Related papers (2021-11-26T02:49:51Z)
Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it. Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z)
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search [84.94597821711808]
We extend PoWER-BERT (Goyal et al., 2020) and propose Length-Adaptive Transformer that can be used for various inference scenarios after one-shot training. We conduct a multi-objective evolutionary search to find a length configuration that maximizes the accuracy and minimizes the efficiency metric under any given computational budget. We empirically verify the utility of the proposed approach by demonstrating the superior accuracy-efficiency trade-off under various setups.
arXiv Detail & Related papers (2020-10-14T12:28:08Z)
Variable Skipping for Autoregressive Range Density Estimation [84.60428050170687]
We show a technique, variable skipping, for accelerating range density estimation over deep autoregressive models. We show that variable skipping provides 10-100$times$ efficiency improvements when targeting challenging high-quantile error metrics.
arXiv Detail & Related papers (2020-07-10T19:01:40Z)
Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network. We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.