Related papers: Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition

Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition

URL: http://arxiv.org/abs/2404.09683v2
Date: Thu, 18 Apr 2024 14:51:55 GMT
Title: Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition
Authors: Tobias Weber, Jakob Dexl, David Rügamer, Michael Ingrisch,
Abstract summary: We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality.
Score: 2.6820435361998918
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model's parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.

Related papers

Pathology Image Compression with Pre-trained Autoencoders [52.208181380986524]
Whole Slide Images in digital histopathology pose significant storage, transmission, and computational efficiency challenges. Standard compression methods, such as JPEG, reduce file sizes but fail to preserve fine-grained phenotypic details critical for downstream tasks. In this work, we repurpose autoencoders (AEs) designed for Latent Diffusion Models as an efficient learned compression framework for pathology images.
arXiv Detail & Related papers (2025-03-14T17:01:17Z)
tCURLoRA: Tensor CUR Decomposition Based Low-Rank Parameter Adaptation and Its Application in Medical Image Segmentation [1.3281936946796913]
Transfer learning, by leveraging knowledge from pre-trained models, has significantly enhanced the performance of target tasks. As deep neural networks scale up, full fine-tuning introduces substantial computational and storage challenges. We propose tCURLoRA, a novel fine-tuning method based on tensor CUR decomposition.
arXiv Detail & Related papers (2025-01-04T08:25:32Z)
Structured Model Pruning for Efficient Inference in Computational Pathology [2.9687381456164004]
We develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging. We empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance.
arXiv Detail & Related papers (2024-04-12T22:05:01Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Neural Network Pruning for Real-time Polyp Segmentation [8.08470060885395]
We show an application of neural network pruning in polyp segmentation. We compute the importance score of convolutional filters and remove the filters having the least scores, which to some value of pruning does not degrade the performance.
arXiv Detail & Related papers (2023-06-22T21:03:50Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets. We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes. We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z)
STN: Scalable Tensorizing Networks via Structure-Aware Training and Adaptive Compression [10.067082377396586]
We propose Scalableizing Networks (STN), which adaptively adjust the model size and decomposition structure without retraining. STN is compatible with arbitrary network architectures and achieves higher compression performance and flexibility over other tensorizing versions.
arXiv Detail & Related papers (2022-05-30T15:50:48Z)
Go Beyond Multiple Instance Neural Networks: Deep-learning Models based on Local Pattern Aggregation [0.0]
convolutional neural networks (CNNs) have brought breakthroughs in processing clinical electrocardiograms (ECGs) and speaker-independent speech. In this paper, we propose local pattern aggregation-based deep-learning models to effectively deal with both problems. The novel network structure, called LPANet, has cropping and aggregation operations embedded into it.
arXiv Detail & Related papers (2022-05-28T13:18:18Z)
Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance. distinct models are required to be trained to reach different points in the rate-distortion (R-D) space. We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z)
Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC) We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
CNN Acceleration by Low-rank Approximation with Quantized Factors [9.654865591431593]
The modern convolutional neural networks although achieve great results in solving complex computer vision tasks still cannot be effectively used in mobile and embedded devices. In order to solve this problem the novel approach combining two known methods, low-rank tensor approximation in Tucker format and quantization of weights and feature maps (activations) is proposed. The efficiency of our method is demonstrated for ResNet18 and ResNet34 on CIFAR-10, CIFAR-100 and Imagenet classification tasks.
arXiv Detail & Related papers (2020-06-16T02:28:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.