LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation
- URL: http://arxiv.org/abs/2408.16886v3
- Date: Mon, 02 Dec 2024 12:39:07 GMT
- Title: LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation
- Authors: Juntao Jiang, Mengmeng Wang, Huizhong Tian, Lingbo Cheng, Yong Liu,
- Abstract summary: This paper introduces LVUNet, a lightweight and vanilla model that leverages pre-trained MobileNetv3-Large backbones and incorporates modules.<n> Experimental results on ISIC 2016, BUSI, CVCClinicDB, CVCColonDB, and KvairSEG datasets demonstrate a better tradeoff between performance and the computational load.
- Score: 16.604140484767377
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While large models have achieved significant progress in computer vision, challenges such as optimization complexity, the intricacy of transformer architectures, computational constraints, and practical application demands highlight the importance of simpler model designs in medical image segmentation. This need is particularly pronounced in mobile medical devices, which require lightweight, deployable models with real-time performance. However, existing lightweight models often suffer from poor robustness across datasets, limiting their widespread adoption. To address these challenges, this paper introduces LV-UNet, a lightweight and vanilla model that leverages pre-trained MobileNetv3-Large backbones and incorporates fusible modules. LV-UNet employs an enhanced deep training strategy and switches to a deployment mode during inference by re-parametrization, significantly reducing parameter count and computational overhead. Experimental results on ISIC 2016, BUSI, CVC-ClinicDB, CVC-ColonDB, and Kvair-SEG datasets demonstrate a better trade-off between performance and the computational load. The code will be released at \url{https://github.com/juntaoJianggavin/LV-UNet}.
Related papers
- LinFusion: 1 GPU, 1 Minute, 16K Image [71.44735417472043]
We introduce a low-rank approximation of a wide spectrum of popular linear token mixers.
We find that the distilled model, termed LinFusion, achieves performance on par with or superior to the original SD.
Experiments on SD-v1.5, SD-v2.1, and SD-XL demonstrate that LinFusion enables satisfactory and efficient zero-shot cross-resolution generation.
arXiv Detail & Related papers (2024-09-03T17:54:39Z) - A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.
A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.
In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - OnDev-LCT: On-Device Lightweight Convolutional Transformers towards
federated learning [29.798780069556074]
Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices.
We propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources.
arXiv Detail & Related papers (2024-01-22T02:17:36Z) - ViR: Towards Efficient Vision Retention Backbones [97.93707844681893]
We propose a new class of computer vision models, dubbed Vision Retention Networks (ViR)
ViR has dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance.
We have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions.
arXiv Detail & Related papers (2023-10-30T16:55:50Z) - Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource
Constrained IoT Systems [12.427821850039448]
We propose a novel split computing approach based on slimmable ensemble encoders.
The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time.
Our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices.
arXiv Detail & Related papers (2023-06-22T06:33:12Z) - READ: Recurrent Adaptation of Large Transformers [7.982905666062059]
Fine-tuning large-scale Transformers becomes impractical as the model size and number of tasks increase.
We introduce textbfREcurrent textbfADaption (READ) -- a lightweight and memory-efficient fine-tuning method.
arXiv Detail & Related papers (2023-05-24T16:59:41Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Coreference Resolution without Span Representations [20.84150608402576]
We introduce a lightweight coreference model that removes the dependency on span representations, handcrafted features, and NLPs.
Our model performs competitively with the current end-to-end model, while being simpler and more efficient.
arXiv Detail & Related papers (2021-01-02T11:46:51Z) - Stable and expressive recurrent vision models [12.578121388491764]
"contractor recurrent back-propagation" (C-RBP) is a new learning algorithm that achieves constant O(1) memory-complexity with steps of recurrent processing.
C-RBP is a general-purpose learning algorithm for any application that can benefit from recurrent dynamics.
arXiv Detail & Related papers (2020-05-22T19:31:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.