TAP-SLF: Parameter-Efficient Adaptation of Vision Foundation Models for Multi-Task Ultrasound Image Analysis
- URL: http://arxiv.org/abs/2603.00433v1
- Date: Sat, 28 Feb 2026 03:21:07 GMT
- Title: TAP-SLF: Parameter-Efficient Adaptation of Vision Foundation Models for Multi-Task Ultrasound Image Analysis
- Authors: Hui Wan, Libin Lan,
- Abstract summary: Task-Aware Prompting and Selective Layer Fine-Tuning (TAP-SLF) is a unified framework for multi-task ultrasound image analysis.<n>TAP-SLF incorporates task-specific priors into the input token sequence and applies LoRA to selected specific top layers of the encoder.<n>Results on the FMC_UIA 2026 Challenge test set, combined with evaluations on the officially released training dataset using an 8:2 train-test split, demonstrate that task-aware prompting and selective layer tuning are effective strategies for efficient VFM adaptation.
- Score: 1.5074458114135958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Executing multiple tasks simultaneously in medical image analysis, including segmentation, classification, detection, and regression, often introduces significant challenges regarding model generalizability and the optimization of shared feature representations. While Vision Foundation Models (VFMs) provide powerful general representations, full fine-tuning on limited medical data is prone to overfitting and incurs high computational costs. Moreover, existing parameter-efficient fine-tuning approaches typically adopt task-agnostic adaptation protocols, overlooking both task-specific mechanisms and the varying sensitivity of model layers during fine-tuning. In this work, we propose Task-Aware Prompting and Selective Layer Fine-Tuning (TAP-SLF), a unified framework for multi-task ultrasound image analysis. TAP-SLF incorporates task-aware soft prompts to encode task-specific priors into the input token sequence and applies LoRA to selected specific top layers of the encoder. This strategy updates only a small fraction of the VFM parameters while keeping the pre-trained backbone frozen. By combining task-aware prompts with selective high-layer fine-tuning, TAP-SLF enables efficient VFM adaptation to diverse medical tasks within a shared backbone. Results on the FMC_UIA 2026 Challenge test set, where TAP-SLF wins fifth place, combined with evaluations on the officially released training dataset using an 8:2 train-test split, demonstrate that task-aware prompting and selective layer tuning are effective strategies for efficient VFM adaptation.
Related papers
- Baseline Method of the Foundation Model Challenge for Ultrasound Image Analysis [15.017057362402687]
We present the Foundation Model Challenge for Ultrasound Image Analysis (FM_UIA2026)<n>The model employs an ImageNet-pretrained EfficientNet--B4 backbone for robust feature extraction, combined with a Feature Pyramid Network (FPN) to capture contextual information.<n>A task-specific routing strategy enables global tasks to leverage high-level semantic features, while dense prediction tasks exploit spatially detailed FPN representations.
arXiv Detail & Related papers (2026-02-01T06:52:11Z) - FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z) - MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation [55.37355146924576]
MedSeqFT is a sequential fine-tuning framework for medical image analysis.<n>It adapts pre-trained models to new tasks while refining their representational capacity.<n>It consistently outperforms state-of-the-art fine-tuning strategies.
arXiv Detail & Related papers (2025-09-07T15:22:53Z) - Group Relative Augmentation for Data Efficient Action Detection [11.169883977958454]
Adapting large Video-Language Models (VLMs) for action detection using only a few examples poses challenges.<n>We propose an efficient adaptation strategy combining parameter-efficient tuning (LoRA) with a novel learnable internal feature augmentation.<n>We demonstrate our method's effectiveness on complex multi-label, multi-person action detection datasets.
arXiv Detail & Related papers (2025-07-28T21:46:05Z) - Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis [8.076987502347327]
Pathology foundation models (PFMs) have emerged as powerful tools for analyzing whole slide images (WSIs)<n>TAPFM uses vision transformer (vit) attention for MIL aggregation while optimizing both for feature representations and attention weights.<n> evaluated on mutation prediction tasks for bladder cancer and lung adenocarcinoma.
arXiv Detail & Related papers (2025-06-05T15:56:45Z) - Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation [24.531539125814877]
Vision Foundation Models (VFMs) are large-scale, pre-trained models that serve as general-purpose backbones for various computer vision tasks.<n>One way to tackle this limitation is by employing a task-agnostic feature upsampling module that refines VFM features resolution.<n>Our benchmarking experiments show that selecting appropriate upsampling strategies significantly improves VFM features quality.
arXiv Detail & Related papers (2025-05-04T11:59:26Z) - Repurposing Foundation Model for Generalizable Medical Time Series Classification [16.21546283978257]
FORMED is a framework for repurposing a backbone foundation model to enable highly generalizable MedTS classification on unseen datasets.<n>We evaluate FORMED on 5 diverse MedTS datasets, benchmarking against 11 Task-Specific Models (TSM) and 4 Task-Specific Adaptation (TSA) methods.<n>Our results demonstrate FORMED's dominant performance, achieving up to 35% absolute improvement in F1-score (on ADFTD dataset) over specialized baselines.
arXiv Detail & Related papers (2024-10-03T23:50:04Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Hierarchical Side-Tuning for Vision Transformers [33.536948382414316]
Fine-tuning pre-trained Vision Transformers (ViTs) has showcased significant promise in enhancing visual recognition tasks.
PETL has shown potential for achieving high performance with fewer parameter updates compared to full fine-tuning.
This paper introduces Hierarchical Side-Tuning (HST), an innovative PETL method facilitating the transfer of ViT models to diverse downstream tasks.
arXiv Detail & Related papers (2023-10-09T04:16:35Z) - Exploring Efficient Few-shot Adaptation for Vision Transformers [70.91692521825405]
We propose a novel efficient Transformer Tuning (eTT) method that facilitates finetuning ViTs in the Few-shot Learning tasks.
Key novelties come from the newly presented Attentive Prefix Tuning (APT) and Domain Residual Adapter (DRA)
We conduct extensive experiments to show the efficacy of our model.
arXiv Detail & Related papers (2023-01-06T08:42:05Z) - Shared Space Transfer Learning for analyzing multi-site fMRI data [83.41324371491774]
Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data.
MVPA works best with a well-designed feature set and an adequate sample size.
Most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes.
This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning approach.
arXiv Detail & Related papers (2020-10-24T08:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.