Lean Unet: A Compact Model for Image Segmentation
- URL: http://arxiv.org/abs/2512.03834v1
- Date: Wed, 03 Dec 2025 14:35:21 GMT
- Title: Lean Unet: A Compact Model for Image Segmentation
- Authors: Ture Hassler, Ida Åkerholm, Marcus Nordström, Gabriele Balletti, Orcun Goksel,
- Abstract summary: Current Unet architectures iteratively downsample spatial resolution while increasing channel dimensions to preserve information content.<n>Channel pruning compresses Unet architecture without accuracy loss, but requires lengthy optimization and may not generalize across tasks and datasets.<n>We propose a lean Unet architecture (LUnet) with a compact, flat hierarchy where channels are not doubled as resolution is halved.
- Score: 2.876199562221577
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unet and its variations have been standard in semantic image segmentation, especially for computer assisted radiology. Current Unet architectures iteratively downsample spatial resolution while increasing channel dimensions to preserve information content. Such a structure demands a large memory footprint, limiting training batch sizes and increasing inference latency. Channel pruning compresses Unet architecture without accuracy loss, but requires lengthy optimization and may not generalize across tasks and datasets. By investigating Unet pruning, we hypothesize that the final structure is the crucial factor, not the channel selection strategy of pruning. Based on our observations, we propose a lean Unet architecture (LUnet) with a compact, flat hierarchy where channels are not doubled as resolution is halved. We evaluate on a public MRI dataset allowing comparable reporting, as well as on two internal CT datasets. We show that a state-of-the-art pruning solution (STAMP) mainly prunes from the layers with the highest number of channels. Comparatively, simply eliminating a random channel at the pruning-identified layer or at the largest layer achieves similar or better performance. Our proposed LUnet with fixed architectures and over 30 times fewer parameters achieves performance comparable to both conventional Unet counterparts and data-adaptively pruned networks. The proposed lean Unet with constant channel count across layers requires far fewer parameters while achieving performance superior to standard Unet for the same total number of parameters. Skip connections allow Unet bottleneck channels to be largely reduced, unlike standard encoder-decoder architectures requiring increased bottleneck channels for information propagation.
Related papers
- TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search [35.93578975066986]
Diffusion Transformers (DiTs) have emerged as a highly scalable and effective backbone for image generation.<n>Mixed-Precision Quantization (MPQ) has demonstrated remarkable success in advancing U-Net quantization to sub-4bit settings.<n>We propose TreeQ, a unified framework addressing key challenges in DiT quantization.
arXiv Detail & Related papers (2025-12-06T08:59:12Z) - DDU-Net: A Domain Decomposition-Based CNN for High-Resolution Image Segmentation on Multiple GPUs [46.873264197900916]
A domain decomposition-based U-Net architecture is introduced, which partitions input images into non-overlapping patches.<n>A communication network is added to facilitate inter-patch information exchange to enhance the understanding of spatial context.<n>Results show that the approach achieves a $2-3,%$ higher intersection over union (IoU) score compared to the same network without inter-patch communication.
arXiv Detail & Related papers (2024-07-31T01:07:21Z) - (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork [60.889175951038496]
Large-scale neural networks have demonstrated remarkable performance in different domains like vision and language processing.
One of the key questions of structural pruning is how to estimate the channel significance.
We propose a novel algorithmic framework, namely textttPASS.
It is a tailored hyper-network to take both visual prompts and network weight statistics as input, and output layer-wise channel sparsity in a recurrent manner.
arXiv Detail & Related papers (2024-07-24T16:47:45Z) - DeblurDiNAT: A Compact Model with Exceptional Generalization and Visual Fidelity on Unseen Domains [1.5124439914522694]
DeDiNAT is a deblurring Transformer based on Dilated Neighborhood Attention.<n>A cross-channel learner aids the Transformer block to understand the short-range relationships between adjacent channels.<n>Compared to state-of-the-art models, our compact DeDiNAT demonstrates superior generalization capabilities and achieves remarkable performance in perceptual metrics.
arXiv Detail & Related papers (2024-03-19T21:31:31Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Pruning-as-Search: Efficient Neural Architecture Search via Channel
Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently.
Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z) - a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure.
We propose a new deep convolution network architecture with three contributions.
Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - Scalable Visual Transformers with Hierarchical Pooling [61.05787583247392]
We propose a Hierarchical Visual Transformer (HVT) which progressively pools visual tokens to shrink the sequence length.
It brings a great benefit by scaling dimensions of depth/width/resolution/patch size without introducing extra computational complexity.
Our HVT outperforms the competitive baselines on ImageNet and CIFAR-100 datasets.
arXiv Detail & Related papers (2021-03-19T03:55:58Z) - Wireless Image Retrieval at the Edge [20.45405359815043]
We study the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server.
Our goal is to maximize the accuracy of the retrieval task under power and bandwidth constraints over the wireless link.
We propose two alternative schemes based on digital and analog communications, respectively.
arXiv Detail & Related papers (2020-07-21T16:15:40Z) - Sequential Hierarchical Learning with Distribution Transformation for
Image Super-Resolution [83.70890515772456]
We build a sequential hierarchical learning super-resolution network (SHSR) for effective image SR.
We consider the inter-scale correlations of features, and devise a sequential multi-scale block (SMB) to progressively explore the hierarchical information.
Experiment results show SHSR achieves superior quantitative performance and visual quality to state-of-the-art methods.
arXiv Detail & Related papers (2020-07-19T01:35:53Z) - Rethinking Channel Dimensions for Efficient Model Design [30.523365449381256]
We study an effective channel dimension configuration towards better performance than the convention.
We propose a simple yet effective channel configuration that can be parameterized by the layer index.
Our proposed model achieves remarkable performance on ImageNet classification and transfer learning tasks.
arXiv Detail & Related papers (2020-07-02T10:01:12Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.