Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers
- URL: http://arxiv.org/abs/2510.25372v1
- Date: Wed, 29 Oct 2025 10:42:56 GMT
- Title: Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers
- Authors: M Yashwanth, Sharannya Ghosh, Aditay Tripathi, Anirban Chakraborty,
- Abstract summary: We propose PEP-FedPT (Prompt Estimation from Prototypes for Federated Prompt Tuning) to achieve both generalization and personalization in visual prompt tuning of Vision Transformers (ViTs)<n>We introduce the novel Class-Contextualized Mixed Prompt (CCMP) - based on class-specific prompts maintained alongside a globally shared prompt.<n> PEP-FedPT consistently surpasses the state-of-the-art baselines under diverse data scenarios.
- Score: 5.231417382224748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Prompt Tuning (VPT) of pre-trained Vision Transformers (ViTs) has proven highly effective as a parameter-efficient fine-tuning technique for adapting large models to downstream tasks with limited data. Its parameter efficiency makes it particularly suitable for Federated Learning (FL), where both communication and computation budgets are often constrained. However, global prompt tuning struggles to generalize across heterogeneous clients, while personalized tuning overfits to local data and lacks generalization. We propose PEP-FedPT (Prompt Estimation from Prototypes for Federated Prompt Tuning), a unified framework designed to achieve both generalization and personalization in federated prompt tuning of ViTs. Within this framework, we introduce the novel Class-Contextualized Mixed Prompt (CCMP) - based on class-specific prompts maintained alongside a globally shared prompt. For each input, CCMP adaptively combines class-specific prompts using weights derived from global class prototypes and client class priors. This approach enables per-sample prompt personalization without storing client-dependent trainable parameters. The prompts are collaboratively optimized via traditional federated averaging technique on the same. Comprehensive evaluations on CIFAR-100, TinyImageNet, DomainNet, and iNaturalist datasets demonstrate that PEP-FedPT consistently surpasses the state-of-the-art baselines under diverse data heterogeneity scenarios, establishing a strong foundation for efficient and generalizable federated prompt tuning of Vision Transformers.
Related papers
- RefProtoFL: Communication-Efficient Federated Learning via External-Referenced Prototype Alignment [20.458428841832742]
Federated learning (FL) enables collaborative model training without sharing raw data in edge environments.<n>We propose RefProtoFL, a communication-efficient FL framework that integrates External-Referenced Prototype Alignment.<n>We show that RefProtoFL attains higher classification accuracy than state-of-the-art prototype-based FL baselines.
arXiv Detail & Related papers (2026-01-21T08:01:14Z) - Instant Personalized Large Language Model Adaptation via Hypernetwork [56.512539596908745]
Profile-to-PEFT is a scalable framework that employs a hypernetwork, trained end-to-end to map a user's encoded profile directly to a full set of adapter parameters.<n>We show that our method outperforms both prompt-based personalization and OPPU while using substantially fewer computational resources at deployment.
arXiv Detail & Related papers (2025-10-18T00:41:25Z) - FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models [97.35577473867296]
Federated Adversarial Prompt Tuning (textbfFedAPT) is a novel method designed to enhance the adversarial robustness of FPT.<n>To address this issue, we propose a textbfclass-aware prompt generator that generates visual prompts from text prompts.<n>Experiments on multiple image classification datasets demonstrate the superiority of FedAPT in improving adversarial robustness.
arXiv Detail & Related papers (2025-09-03T03:46:35Z) - pFedMMA: Personalized Federated Fine-Tuning with Multi-Modal Adapter for Vision-Language Models [12.270878920401948]
pFedMMA is the first personalized federated learning framework that leverages multi-modal adapters for vision-language tasks.<n>We show that pFedMMA achieves state-of-the-art trade-offs between personalization and generalization, outperforming recent federated prompt tuning methods.
arXiv Detail & Related papers (2025-07-07T18:26:34Z) - Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization [51.562474873972086]
Federated domain generalization (FedDG) aims to learn a globally generalizable model from decentralized clients with heterogeneous data.<n>Recent studies have introduced prompt learning to adapt vision-language models (VLMs) in FedDG by learning a single global prompt.<n>We propose TRIP, a Token-level prompt mixture with parameter-free routing framework for FedDG.
arXiv Detail & Related papers (2025-04-29T11:06:03Z) - FedAli: Personalized Federated Learning Alignment with Prototype Layers for Generalized Mobile Services [9.683642138601464]
Federated Alignment (FedAli) is a prototype-based regularization technique that enhances inter-client alignment while strengthening the robustness of personalized adaptations.<n>At its core, FedAli introduces the ALignment with Prototypes layer, inspired by human memory, to enhance generalization.<n>Our experiments show that FedAli significantly enhances client generalization while preserving strong personalization in heterogeneous settings.
arXiv Detail & Related papers (2024-11-15T21:35:21Z) - CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing [66.6712018832575]
Domain generalization (DG) based Face Anti-Spoofing (FAS) aims to improve the model's performance on unseen domains.
We make use of large-scale VLMs like CLIP and leverage the textual feature to dynamically adjust the classifier's weights for exploring generalizable visual features.
arXiv Detail & Related papers (2024-03-21T11:58:50Z) - Unlocking the Potential of Prompt-Tuning in Bridging Generalized and
Personalized Federated Learning [49.72857433721424]
Vision Transformers (ViT) and Visual Prompt Tuning (VPT) achieve state-of-the-art performance with improved efficiency in various computer vision tasks.
We present a novel algorithm, SGPT, that integrates Generalized FL (GFL) and Personalized FL (PFL) approaches by employing a unique combination of both shared and group-specific prompts.
arXiv Detail & Related papers (2023-10-27T17:22:09Z) - Visual Prompt Based Personalized Federated Learning [83.04104655903846]
We propose a novel PFL framework for image classification tasks, dubbed pFedPT, that leverages personalized visual prompts to implicitly represent local data distribution information of clients.
Experiments on the CIFAR10 and CIFAR100 datasets show that pFedPT outperforms several state-of-the-art (SOTA) PFL algorithms by a large margin in various settings.
arXiv Detail & Related papers (2023-03-15T15:02:15Z) - PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees [95.87604231887353]
Existing pFL methods introduce high communication and computation costs or are vulnerable to test communication.
In PerAda, a parameter distillation and pFL pFL has superior performance, especially under test-time distribution.
Our code is available at https://github.com/NV/PerAda.
arXiv Detail & Related papers (2023-02-13T19:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.