Related papers: Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

URL: http://arxiv.org/abs/2507.17347v3
Date: Mon, 28 Jul 2025 08:44:20 GMT
Title: Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation
Authors: Haotian Chen, Zhiyong Xiao,
Abstract summary: This paper introduces TUNable Adapter module (Swin-TUNA), a.<n> Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the.<n>Swin Transformer architecture.<n> Experiments demonstrate that this method achieves mIoU of 50.56% and 74.94% on the FoodSeg103 and UECFoodPix Complete datasets.
Score: 3.061662434597098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing strategy for tasks-agnostic and task-specific features. Experiments demonstrate that this method achieves mIoU of 50.56% and 74.94% on the FoodSeg103 and UECFoodPix Complete datasets, respectively, surpassing the fully parameterized FoodSAM model while reducing the parameter count by 98.7% (to only 8.13M). Furthermore, Swin-TUNA exhibits faster convergence and stronger generalization capabilities in low-data scenarios, providing an efficient solution for assembling lightweight food image.

Related papers

MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation [92.57609195819647]
MuSASplat is a novel framework that dramatically reduces the computational burden of training pose-free feed-forward 3D Gaussian splats models.<n>Central to our approach is a lightweight Multi-Scale Adapter that enables efficient fine-tuning of ViT-based architectures with only a small fraction of training parameters.
arXiv Detail & Related papers (2025-12-08T04:56:46Z)
Lightweight Vision Transformer with Window and Spatial Attention for Food Image Classification [1.1472801896854488]
We propose a lightweight food image classification algorithm that integrates a Window Multi-Head Attention Mechanism (WMHAM) and a Spatial Attention Mechanism (SAM)<n>Our model achieves accuracies of 95.24% and 94.33%, respectively, while significantly reducing parameters and FLOPs compared with baseline methods.
arXiv Detail & Related papers (2025-09-23T06:23:50Z)
Optimizing Specific and Shared Parameters for Efficient Parameter Tuning [46.57365875007367]
We propose SaS, a novel PETL method that effectively mitigates distributional shifts during fine-tuning.<n>SaS captures common statistical characteristics across layers using low-rank projections.<n>Experiments on diverse downstream tasks, few-shot settings and domain generalization demonstrate that SaS significantly enhances performance.
arXiv Detail & Related papers (2025-04-04T13:43:54Z)
Hyper Compressed Fine-Tuning of Large Foundation Models with Quantum Inspired Adapters [0.0]
emphQuantum-Inspired Adapters, a PEFT approach inspired by Hamming-weight quantum circuits from quantum machine learning literature.<n>We test our proposed adapters by adapting large language models and large vision transformers on benchmark datasets.
arXiv Detail & Related papers (2025-02-10T13:06:56Z)
RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks [16.512587987753967]
RECAST is a novel method that dramatically reduces task-specific trainable parameters to fewer than 50.<n>We show that RECAST outperforms the state-of-the-art by up to 3% across various scales, architectures, and parameter spaces.
arXiv Detail & Related papers (2024-11-25T19:08:38Z)
TOAST: Transformer Optimization using Adaptive and Simple Transformations [40.311292704886235]
We introduce TOAST, a framework that exploits redundancies to approximate entire transformer blocks with lightweight closed-form mappings.<n>Results show that large portions of transformer depth can be replaced by trivial functions, opening a new perspective on efficient foundation models.
arXiv Detail & Related papers (2024-10-07T11:35:24Z)
RoDE: Linear Rectified Mixture of Diverse Experts for Food Large Multi-Modal Models [96.43285670458803]
Uni-Food is a unified food dataset that comprises over 100,000 images with various food labels.<n>Uni-Food is designed to provide a more holistic approach to food data analysis.<n>We introduce a novel Linear Rectification Mixture of Diverse Experts (RoDE) approach to address the inherent challenges of food-related multitasking.
arXiv Detail & Related papers (2024-07-17T16:49:34Z)
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts [52.39959535724677]
We introduce an alternative solution to improve the generalization of image restoration models. We propose AdaptIR, a Mixture-of-Experts (MoE) with multi-branch design to capture local, global, and channel representation bases. Our AdaptIR achieves stable performance on single-degradation tasks, and excels in hybrid-degradation tasks, with fine-tuning only 0.6% parameters for 8 hours.
arXiv Detail & Related papers (2023-12-12T14:27:59Z)
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer. Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z)
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization [100.90624220423634]
We present ComPEFT, a novel method for compressing fine-tuning residuals (task vectors) of PEFT based models. In extensive evaluation across T5, T0, and LLaMA-based models with 200M - 65B parameters, ComPEFT achieves compression ratios of 8x - 50x.
arXiv Detail & Related papers (2023-11-22T05:28:59Z)
Hierarchical Side-Tuning for Vision Transformers [33.536948382414316]
Fine-tuning pre-trained Vision Transformers (ViTs) has showcased significant promise in enhancing visual recognition tasks. PETL has shown potential for achieving high performance with fewer parameter updates compared to full fine-tuning. This paper introduces Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks.
arXiv Detail & Related papers (2023-10-09T04:16:35Z)
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm [111.17100512647619]
This paper explains the rationality of Vision Transformer by analogy with the proven practical evolutionary algorithm (EA) We propose a novel pyramid EATFormer backbone that only contains the proposed EA-based transformer (EAT) block. Massive quantitative and quantitative experiments on image classification, downstream tasks, and explanatory experiments demonstrate the effectiveness and superiority of our approach.
arXiv Detail & Related papers (2022-06-19T04:49:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.